ZELLA / Docs Download for macOS

Captions & Subtitles in Zella

Animated captions appearing word-by-word over a Zella preview

Quick answer: Open the Captions tab in Zella’s editor and click Transcribe audio — Zella transcribes on-device (no upload) into a word-level transcript. Pick one of 6 viral styles (Word Pop, Hormozi, Karaoke, Pop Box, Neon, Clean), set size/color/position, and the captions burn into your export. Need a file? Export SRT, VTT, or CSV. Captions stay aligned even after you cut the video.

On this page: why captions matter · transcribe · styles · active-word highlight · size & position · edit text · ripple · export SRT/VTT · impact · FAQ


Why add captions to a video?

Most social video is watched on mute, and captions are the single highest-leverage edit for watch-time and accessibility. Zella transcribes on-device, renders viral-style animated captions, and exports standard subtitle files — no third-party caption app needed.

Annotated diagram of Zella's Captions tab showing the Transcribe button, six style presets, size/position controls, and SRT/VTT/CSV export with numbered callouts

Figure: the Captions tab. ① Transcribe, ② pick a preset, ③ set size & position, ④ export SRT/VTT/CSV.


How to transcribe a video

  1. Open the Captions tab (right inspector).
  2. Confirm the Language (default English; one language per pass).
  3. Click Transcribe audio.

Zella transcribes on-device and builds a word-level transcript — every word, in order, with ascending timestamps (not just the last phrase) — then aligns caption lines to the speech.

The clip needs spoken audio; a silent screen capture has nothing to caption.


How to choose a caption style

Zella ships 6 viral caption presets — pick the one that fits your channel:

PresetLookBest for
Word PopWords pop in one at a timeHigh-energy social
HormoziBold, high-contrast, punchyHooks, sales
KaraokeActive word highlighted as spokenRetention
Pop BoxWords in a filled boxClean + bold
NeonGlowing neon textStylized reels
CleanMinimal subtitlesTutorials, courses

Select one in the Captions tab; each looks visibly different.


What is active-word highlighting?

With styles like Karaoke (and word-by-word styles), the active word is highlighted as it’s spoken, advancing word-by-word within each line. This is the “retention caption” look that keeps viewers locked in — it’s on automatically for styles that use it.


How to set caption font size, color, and position

In the Captions tab set:

  • Font size — larger for vertical/social, smaller for desktop tutorials.
  • Color — match your brand (and it stays colored even on a black-and-white grade).
  • Positionbottom (classic) or center (reels).
  • Fit/Fill behavior so captions sit safely inside a reframed vertical video without overflowing.

How to edit caption text

Transcription is excellent but not perfect (names, jargon, brand terms):

  1. Click a caption line in the Captions tab and edit the text.
  2. Your edit sticks and renders in the preview and export.

Do captions stay aligned after cuts?

Yes. Captions are time-aware: if you ripple-delete a chunk (chapter 9) or run Remove Silences/Fillers (chapter 10), later captions slide earlier to match the new timeline and stay on the right words.


How to export SRT, VTT, or CSV subtitles

For YouTube’s caption upload, a web player, or localization:

  1. In the Captions tab, use export to write SRT, VTT, or CSV.
  2. The file opens cleanly in any standard tool, with a cue count matching your caption lines.

You can also import an SRT to rebuild a caption track from an external transcript.


What impact captions have

For content creators and video editors, captions deliver outsized results:

  • Higher watch-time and completion — sound-off viewers stay because they can follow along; animated word-by-word captions measurably hold attention.
  • Wider reach and accessibility — captioned video is watchable by deaf/hard-of-hearing viewers and in sound-sensitive places (offices, transit), expanding your audience.
  • Better SEO and discovery — an exported SRT feeds platform transcripts and search.
  • Faster production — on-device transcription plus presets replaces a separate captioning tool and hours of manual timing.

Captions FAQ

How do I add subtitles to a video on a Mac for free? Open Zella’s Captions tab → Transcribe audio → pick a style. It’s on-device, no subscription.

Are captions burned in or a separate file? Both — they burn into MP4/MOV/GIF exports, and you can also export SRT/VTT/CSV.

Do captions keep their color on a black-and-white video? Yes — the footage desaturates while captions retain their color.

Will captions stay in sync if I cut the video? Yes — they ripple with the timeline and stay on the right words.

Can I fix a misheard word? Yes — click the caption line and edit the text; it sticks and renders.

Pro tips & gotchas

  • Transcription runs per-utterance and chunks long videos — give it a moment on big files.
  • Pick a preset first, then tweak size/position; use Fit to keep text on one line or Fill for emphasis.
  • Export SRT/VTT to upload captions to YouTube, or CSV to edit the transcript in a spreadsheet.
  • Captions stay in sync across multi-segment edits — Zella renders them through a video composition automatically.

Related: AI cleanup (silences/fillers) → · Reframe for vertical → · Color & black-and-white → · Export →