A taxonomy of openers that work in vertical video, and the way each one fails.

The word "hook" gets used like everyone agrees what it means, and nobody does. To one person it is the title text on the first frame. To another it is the opening line of voiceover. To a third it is the thumbnail. Talking past each other about hooks is one of the reasons creator advice on this topic is so cluttered.
A working definition: the hook is the thing that happens in the first 1 to 3 seconds of your video that decides whether the viewer keeps watching or swipes. Everything that happens in those frames is the hook. The visual, the audio, the text on screen, the energy of your face. All of it competes for the viewer attention before the viewer has decided whether to give you the next ten seconds.
I wrote this because every time I run a short video through Jeena, the data tells the same story. The first second is where the bleed happens. By the third second, whoever was going to stay has stayed. The hook is not "important." It is the entire vote.
Two things, simultaneously. First, it gives the viewer eye somewhere to land. A face, a clear object, a piece of text that is readable at thumb-scrolling speed. If the eye cannot settle, the viewer leaves before they have even processed what the video is about.
Second, it makes a promise about what the next ten seconds will deliver. Curiosity (you will not believe what happens next), utility (you will know how to do this by the end), recognition (you will feel seen), or fun (you will laugh). The promise has to be specific enough that the viewer can decide whether they want it.
A hook that gives the eye a place to land but makes no promise is a stock photo. A hook that makes a promise but gives the eye nowhere to land is a podcast intro. You need both.

Open a question the viewer cannot answer without watching the rest. "Here is why your skincare routine is making your acne worse" forces the viewer to either keep watching or live with not knowing. Works when the answer is genuinely non-obvious. Fails when the gap is fake (the answer is "moisturise more" and the viewer can guess) or when the gap is too narrow (the question only matters to 1% of viewers).
A specific, falsifiable claim the viewer would not have expected. "Most personal trainers cannot tell you the difference between mobility and flexibility." Works when the claim is true, defensible, and the viewer suspects it might be. Fails when the claim is hyperbole the viewer instantly dismisses, or when it is so contrarian the viewer assumes bad faith.
A visual or audio event that breaks the default rhythm of the scroll. A sudden whip-zoom, a face that fills the frame, a sound that is louder than the previous video. Works when the interrupt is paired with a promise (curiosity, utility) in the next half second. Fails when it is interruption for its own sake. Viewers learn to recognise empty pattern interrupts within three exposures and start swiping faster.
Look into the camera and name the person you are talking to. "If you are a dermatologist who makes content about skincare, this is for you." Works when the named audience is large enough to be statistically present in the feed but specific enough to feel chosen. Fails when the named audience is too generic ("if you make content") or when the named audience cannot believe you can talk to them specifically.
Position your video as the missing piece of advice the viewer has not heard. "Here is what nobody tells you about pricing your service in year one." Works when you actually have a non-obvious thing to say. Fails when the thing you say is in fact what everyone tells you, and the viewer realises by second eight.
![A 2x3 grid of glossy 3D cards on a light blue background, alternating pink and yellow-green. Card 1 Curiosity gap with a question-mark icon: "This tiny change changed everything..." Card 2 Shocking claim with a lightning-bolt icon: "Most people are doing it wrong." Card 3 Pattern interrupt with a starburst icon: "Stop scrolling if you are tired of..." Card 4 Direct address with a person icon: "If you are a creative entrepreneur, listen up." Card 5 Here is what nobody tells you with a speech-bubble icon: "The truth about [topic] that nobody talks about." Card 6 What fails with an X icon: Starting too slow, Too much context, No clear point.](/blog-assets/what-is-a-hook/hook-types-grid.png)
By the third second, whoever was going to stay has stayed. The hook is not important. It is the entire vote.
Watch your own video on your phone. Hold your thumb in scrolling position. Watch the first three seconds the way a stranger would, then ask yourself one question: at what frame did you decide whether to keep watching?
If you cannot point at a specific frame, your hook is not landing. A working hook has a moment the viewer can identify in their own attention. A failing hook is a wash of seconds during which the viewer was already swiping in their head.
The thumb-on-scroll check above is honest about your own attention, but it can only ever tell you what you think the hook does. The bigger question is whether a sample of strangers thinks the same. Jeena answers that with three layers of measurement that run while real people watch your video on their phones.
You get a per-second timeline of what the panel actually did. For a hook specifically, only the first three seconds of that timeline matter: whether attention clustered on one point, whether anyone reacted.
"My dog is crying because his friend is gone"

"Hello, I'm Daria, I recently moved..."

Both clips are the first three seconds of real videos I tested on Jeena. Same person, same camera, same lighting. The only thing that changed was the opening line. The dog one opens with a question that promises a story; viewer gaze locks onto the speaker waiting to hear the rest. The Netherlands intro promises nothing specific; viewer gaze checks the speaker for a beat, then wanders to whatever else is in the frame because there is nothing to anchor it. Three seconds is enough for the difference to register in the gaze data, and once you see it, you cannot unsee it.
Before you press record, write down the hook on paper. One sentence for the visual, one sentence for the promise. If you cannot, you do not have a hook. You have an idea.
Then film the first three seconds three times. Three different visual entries, three different promise framings. Watch all three back. Pick the one where you can name the frame at which you would have decided to keep watching.
Do not rewrite the rest of the video to fit a different hook. If the only hook that lands is for a different video than the one you wrote, change the video.
Upload your video to Jeena. Real viewers watch it on their phones with the front camera on. The attention heatmap shows you where their eyes landed in the first three seconds, the wow-moments chart flags whether anyone reacted, and the perception summary tells you how viewers described the video. If the hook is broken, the data tells you exactly where.
No "schedule a call." No sales rep. Upload, get your report in a couple of days, fix the opener before the post goes live.
The decision to keep watching is made by the third second. Some hooks resolve in one second (a strong face filling the frame); some need three (a curiosity gap that takes two beats to set up). Past three seconds you are no longer in the hook. You are in the body of the video, and the audience has already decided.
Yes. A face delivering a strong opening line works. A visually surprising frame works. Text on screen is one common implementation because it is readable at scrolling speed and the audio might be off, but it is not the only one. Some of the highest-retention short videos open with a single arresting visual and zero text.
Viewers pattern-recognise. A curiosity-gap opener that everyone has been copying for six months gets recognised in the first half second and dismissed as a template. Hooks decay the way trends decay. The ones that last longer are the ones tied to a specific point of view or a specific personal story, because those are harder to copy at scale.
Jeena is a neuromarketing platform for short-form video. Real people watch your video on their phone with the front camera on. Jeena captures their gaze direction, blink rate, eyebrow raises, and their impressions of the video in a short survey afterward. You receive an AI-powered report with an attention heatmap, a visibility map, a wow-moments chart, a summary of how viewers perceived the video, and three specific recommendations for making the video work harder.
Jeena uses smartphone front-camera gaze tracking. Each engager calibrates once, then watches your video. The platform records where their gaze lands frame by frame, flags moments of surprise from facial expression, and combines that with a short impressions survey afterward. The result is a per-second timeline of what real viewers actually looked at and felt, plus a summary of how they perceived the video overall.
A typical test costs around ten euros. See the pricing page for current rates.