GuideGuideJuly 9, 2026

What is a hook, really: the 1 to 3 second test every video has to pass

A taxonomy of openers that work in vertical video, and the way each one fails.

A hand holding a phone showing a vertical short video of a smiling plumber in a green-tiled bathroom popping a champagne bottle out of a toilet, foam spraying upward, a champagne flute on the floor. A yellow-green gradient glow falls onto the light-blue surface beneath the phone. A surreal pattern-interrupt opener.

Why I wrote this

The word "hook" gets used like everyone agrees what it means, and nobody does. To one person it is the title text on the first frame. To another it is the opening line of voiceover. To a third it is the thumbnail. Talking past each other about hooks is one of the reasons creator advice on this topic is so cluttered.

A working definition: the hook is the thing that happens in the first 1 to 3 seconds of your video that decides whether the viewer keeps watching or swipes. Everything that happens in those frames is the hook. The visual, the audio, the text on screen, the energy of your face. All of it competes for the viewer attention before the viewer has decided whether to give you the next ten seconds.

I wrote this because every time I run a short video through Jeena, the data tells the same story. The first second is where the bleed happens. By the third second, whoever was going to stay has stayed. The hook is not "important." It is the entire vote.

What a hook actually does

Two things, simultaneously. First, it gives the viewer eye somewhere to land. A face, a clear object, a piece of text that is readable at thumb-scrolling speed. If the eye cannot settle, the viewer leaves before they have even processed what the video is about.

Second, it makes a promise about what the next ten seconds will deliver. Curiosity (you will not believe what happens next), utility (you will know how to do this by the end), recognition (you will feel seen), or fun (you will laugh). The promise has to be specific enough that the viewer can decide whether they want it.

A hook that gives the eye a place to land but makes no promise is a stock photo. A hook that makes a promise but gives the eye nowhere to land is a podcast intro. You need both.

A three-frame storyboard infographic on a cream background. Three glossy 3D video frames labelled 1s, 2s, 3s, in pink, yellow-green, and light blue respectively, each with a soft-3D eye marker showing where attention landed. A thin timeline below with three colour-matched dots. — The hook is the first three seconds. Whatever happens here decides whether the viewer stays.

Five hook types that work in vertical video

Curiosity gap

Open a question the viewer cannot answer without watching the rest. "Here is why your skincare routine is making your acne worse" forces the viewer to either keep watching or live with not knowing. Works when the answer is genuinely non-obvious. Fails when the gap is fake (the answer is "moisturise more" and the viewer can guess) or when the gap is too narrow (the question only matters to 1% of viewers).

Shocking claim

A specific, falsifiable claim the viewer would not have expected. "Most personal trainers cannot tell you the difference between mobility and flexibility." Works when the claim is true, defensible, and the viewer suspects it might be. Fails when the claim is hyperbole the viewer instantly dismisses, or when it is so contrarian the viewer assumes bad faith.

Pattern interrupt

A visual or audio event that breaks the default rhythm of the scroll. A sudden whip-zoom, a face that fills the frame, a sound that is louder than the previous video. Works when the interrupt is paired with a promise (curiosity, utility) in the next half second. Fails when it is interruption for its own sake. Viewers learn to recognise empty pattern interrupts within three exposures and start swiping faster.

Direct address

Look into the camera and name the person you are talking to. "If you are a dermatologist who makes content about skincare, this is for you." Works when the named audience is large enough to be statistically present in the feed but specific enough to feel chosen. Fails when the named audience is too generic ("if you make content") or when the named audience cannot believe you can talk to them specifically.

Here is what nobody tells you

Position your video as the missing piece of advice the viewer has not heard. "Here is what nobody tells you about pricing your service in year one." Works when you actually have a non-obvious thing to say. Fails when the thing you say is in fact what everyone tells you, and the viewer realises by second eight.

A 2x3 grid of glossy 3D cards on a light blue background, alternating pink and yellow-green. Card 1 Curiosity gap with a question-mark icon: "This tiny change changed everything..." Card 2 Shocking claim with a lightning-bolt icon: "Most people are doing it wrong." Card 3 Pattern interrupt with a starburst icon: "Stop scrolling if you are tired of..." Card 4 Direct address with a person icon: "If you are a creative entrepreneur, listen up." Card 5 Here is what nobody tells you with a speech-bubble icon: "The truth about [topic] that nobody talks about." Card 6 What fails with an X icon: Starting too slow, Too much context, No clear point. — Each hook type and the way it most commonly breaks.

By the third second, whoever was going to stay has stayed. The hook is not important. It is the entire vote.

How to know if your hook is working

Watch your own video on your phone. Hold your thumb in scrolling position. Watch the first three seconds the way a stranger would, then ask yourself one question: at what frame did you decide whether to keep watching?

If you cannot point at a specific frame, your hook is not landing. A working hook has a moment the viewer can identify in their own attention. A failing hook is a wash of seconds during which the viewer was already swiping in their head.

How Jeena measures the first three seconds

The thumb-on-scroll check above is honest about your own attention, but it can only ever tell you what you think the hook does. The bigger question is whether a sample of strangers thinks the same. Jeena answers that with three layers of measurement that run while real people watch your video on their phones.

Front-camera gaze tracking
Each viewer calibrates once, then watches your video with the front camera on. The platform samples their gaze at 15 frames per second, recording the screen coordinates of where they actually looked.
Panel of up to 10 real viewers
The Basic tier runs your video through up to 10 humans from the Jeena App audience (minimum 5). Five viewers catch the most obvious recurring failure modes (~85 percent of them per Nielsen); 10 lifts that to ~95 percent, the long-tail failures the first five might miss.
Survey + facial-expression overlay
After each watch viewers answer what they saw and felt in their own words. Plus blink rate and eyebrow raises from the front camera. The sum across viewers is the wow-moments chart in the report.

You get a per-second timeline of what the panel actually did. For a hook specifically, only the first three seconds of that timeline matter: whether attention clustered on one point, whether anyone reacted.

A working hook vs a failing one, on me

Working hook

"My dog is crying because his friend is gone"

•Opens with a question that promises a story.
•Gaze locks on the speaker.
•Attention holds through second three.

Failing hook

"Hello, I'm Daria, I recently moved..."

First three seconds of a vertical short video where the author opens with "Hello, I am Daria, I recently moved to the Netherlands." The framing is similar but there is no question or promise to anchor attention.

•Opens with an introduction. No promise.
•Gaze drifts off the speaker.
•Attention scatters across the frame.

Both clips are the first three seconds of real videos I tested on Jeena. Same person, same camera, same lighting. The only thing that changed was the opening line. The dog one opens with a question that promises a story; viewer gaze locks onto the speaker waiting to hear the rest. The Netherlands intro promises nothing specific; viewer gaze checks the speaker for a beat, then wanders to whatever else is in the frame because there is nothing to anchor it. Three seconds is enough for the difference to register in the gaze data, and once you see it, you cannot unsee it.

What this means if you are filming this week

Before you press record, write down the hook on paper. One sentence for the visual, one sentence for the promise. If you cannot, you do not have a hook. You have an idea.

Then film the first three seconds three times. Three different visual entries, three different promise framings. Watch all three back. Pick the one where you can name the frame at which you would have decided to keep watching.

Do not rewrite the rest of the video to fit a different hook. If the only hook that lands is for a different video than the one you wrote, change the video.

Test your hook on real viewers before you post

Upload your video to Jeena. Real viewers watch it on their phones with the front camera on. The attention heatmap shows you where their eyes landed in the first three seconds, the wow-moments chart flags whether anyone reacted, and the perception summary tells you how viewers described the video. If the hook is broken, the data tells you exactly where.

No "schedule a call." No sales rep. Upload, get your report in a couple of days, fix the opener before the post goes live.

Frequently asked

How long should a hook be in vertical video?+

The decision to keep watching is made by the third second. Some hooks resolve in one second (a strong face filling the frame); some need three (a curiosity gap that takes two beats to set up). Past three seconds you are no longer in the hook. You are in the body of the video, and the audience has already decided.

Can you have a hook without text on screen?+

Yes. A face delivering a strong opening line works. A visually surprising frame works. Text on screen is one common implementation because it is readable at scrolling speed and the audio might be off, but it is not the only one. Some of the highest-retention short videos open with a single arresting visual and zero text.

Why do hooks that worked last year stop working?+

Viewers pattern-recognise. A curiosity-gap opener that everyone has been copying for six months gets recognised in the first half second and dismissed as a template. Hooks decay the way trends decay. The ones that last longer are the ones tied to a specific point of view or a specific personal story, because those are harder to copy at scale.

What is Jeena?+

Jeena is a neuromarketing platform for short-form video. Real people watch your video on their phone with the front camera on. Jeena captures their gaze direction, blink rate, eyebrow raises, and their impressions of the video in a short survey afterward. You receive an AI-powered report with an attention heatmap, a visibility map, a wow-moments chart, a summary of how viewers perceived the video, and three specific recommendations for making the video work harder.

How does Jeena measure viewer attention?+

Jeena uses smartphone front-camera gaze tracking. Each engager calibrates once, then watches your video. The platform records where their gaze lands frame by frame, flags moments of surprise from facial expression, and combines that with a short impressions survey afterward. The result is a per-second timeline of what real viewers actually looked at and felt, plus a summary of how they perceived the video overall.

How much does it cost to test a video on Jeena?+

A typical test costs around ten euros. See the pricing page for current rates.