Audio in Zella: Voice, Music & Auto-Duck

Animated audio waveform with music ducking under the voice

Quick answer: In Zella, click AI Tools → Polish Voice → Apply to clean and normalize your narration (high-pass, de-ess, compression, ~−14 LUFS). Import an audio file as a music overlay, then click Auto-Duck Music → Apply so the music lowers under your voice and recovers in the gaps. For smoother edits, use the Clip tab’s J/L-cut Lead slider to make audio lead (+) or lag (−) the picture.

On this page: polish voice · add music · auto-duck · j/l cuts · sync · impact · FAQ

How to clean up and level your voice

What it does: runs narration through an offline chain — high-pass filter (removes rumble), de-esser (tames harsh “s”), compressor (evens out level), and loudness normalization to a target (e.g. −14 LUFS, the social/YouTube standard).

Annotated diagram showing Polish Voice and Auto-Duck Music apply buttons, a music-ducking-under-speech curve, and the Clip tab J/L-cut Lead slider

Figure: ① Polish Voice (−14 LUFS), ② Auto-Duck (music dips under speech), ③ the J/L-cut Lead slider.

AI Tools → Polish Voice → Apply.
Choose the loudness target (e.g. YouTube −14).
Zella processes the voice offline and uses the polished track on export.

It’s a one-click “sound like a podcast” button — especially valuable on built-in mics.

How to add background music

Import an audio file (.mp3, .m4a, .wav) — see chapter 8 — and it attaches as an audio overlay.
Position it on the timeline and set its level so it sits under your voice.

The music plays at its natural rate even if parts of the video are sped up (chapter 13).

How to duck music under speech

What it does: automatically lowers the music whenever you’re talking and brings it back in the gaps — the “radio DJ” effect — so your voice is always clear.

Add your music bed (above).
AI Tools → Auto-Duck Music → Apply.
Zella detects your voice-active windows and dips the music (e.g. ~−13 dB) under them, recovering in the silences.

How to make J-cuts and L-cuts

What they do: offset audio relative to picture at a cut for a smoother, more professional edit.

J-cut: the next scene’s audio starts before its picture (you hear it coming).
L-cut: the current scene’s audio lingers after its picture cuts away.

Open the Clip tab (right inspector).
Use the Audio Timing (J/L-cut) Lead slider: toward + = audio leads (J-cut), toward − = audio lags (L-cut).
Small offsets (a few hundred ms) feel natural.

Do I need to fix lip-sync?

No. Zella calibrates A/V timing at record time — including the hard case of a camera bubble over a blurred/AI-removed background — so your lips match your voice and on-screen actions line up with their sounds. If you think sync is off, check the exported file (frame-accurate), not the live preview (decode-optimized for scrubbing).

What impact good audio has

For content creators and video editors, audio is half the perceived quality of a video — often more than picture:

Viewers tolerate rough video but not rough audio — clean, consistently-loud narration (Polish Voice) is the difference between “professional” and “homemade,” directly affecting trust and watch-time.
Clear voice over music — Auto-Duck means you can add energy with a music bed without ever burying the message.
Polished edits — J/L cuts smooth scene changes so cuts feel deliberate, not abrupt.
One app — voice cleanup, music, and ducking that normally need an audio editor are one-click here.

Audio FAQ

How do I make my voice louder and clearer in a video? AI Tools → Polish Voice → Apply — it filters, de-esses, compresses, and normalizes to ~−14 LUFS.

How do I add background music that doesn’t drown out my voice? Import the music as an overlay, then Auto-Duck Music → Apply so it dips under your speech.

What’s a J-cut / L-cut? J = audio leads the next picture; L = audio lingers after the current picture cuts. Set it with the Lead slider in the Clip tab.

Does my music speed up if I speed up the video? No — music beds play at their natural rate under a sped segment.

My camera audio seems out of sync — how do I fix it? Check the exported file (it’s frame-accurate); the preview can look slightly off. Zella calibrates sync at record time.

Pro tips & gotchas

Polish Voice runs an offline chain (high-pass, de-ess, compression, ~−14 LUFS) — the single biggest quality win for narration.
Add a music file as an overlay, then Auto-Duck so it ducks under your voice and recovers in the gaps.
The Clip tab → J/L-cut Lead slider nudges audio to lead (+) or lag (−) the picture for smoother scene changes.
Record with the capture-time mic toggles on (see Look & sound good) so Polish Voice has a clean source.

Related: AI cleanup → · Speed → · Transitions → · Import audio →