AI Cleanup in Zella: Remove Silences & Filler Words
Quick answer: In Zella’s editor, open the AI Tools sidebar and click Remove Silences → Apply to ripple-delete every silent gap (your video gets shorter without chopping words). Then click Remove Fillers → Apply to cut “um, uh, like, you know, so.” Add Auto-enhance for a one-click picture lift. Every action is undoable with ⌘Z.
On this page: what AI cleanup does · find it · remove silences · remove fillers · auto-enhance · undo · order & impact · FAQ
What does AI cleanup do?
AI cleanup turns a raw, rambling take into a tight, professional cut without manual editing. It analyzes your audio (and cursor) and applies ripple edits or polish in one click. The headline tools — Remove Silences and Remove Fillers — are what make talking-head and tutorial footage watchable. (Auto-Zoom is covered in chapter 12; Polish Voice and Auto-Duck in chapter 14.)
Where to find the AI cleanup tools
Editor → left sidebar → AI Tools tab → AI CLEANUP section. Each tool is a row with an Apply button.
Figure: the AI CLEANUP list. ① Remove Silences and ② Remove Fillers ripple-delete gaps and filler words; the before/after timeline shows the video tightening.
How to remove silences from a video
What it does: scans the audio, finds the silent gaps between sentences, and ripple-deletes them so the video tightens and gets shorter — without chopping anyone mid-word.
- AI Tools → Remove Silences → Apply.
- Zella scans and lists the silence cuts (e.g.
Silence · 2.6s). - It applies them, ripple-closing every gap; the total duration drops to just the spoken content.
A 49-second rambling take typically becomes ~29 seconds of pure content. Cuts snap to a frame grid and respect word boundaries, so speech isn’t clipped.
How to remove filler words
What it does: detects and removes filler words so you sound crisp and confident.
- AI Tools → Remove Fillers → Apply.
- Zella transcribes, flags filler candidates, and lists the cuts (e.g.
"like","so" (sentence start)). - Apply to ripple them out.
Filler types: hard fillers (“um,” “uh”), soft fillers (“like,” “you know”), and bigram/starter fillers (sentence-initial “so/OK,” “I mean”).
Note on “um”/“uh”: Apple’s on-device speech engine often won’t transcribe pure “um”/“uh” sounds, so they may not appear as removable tokens. Soft/bigram/starter fillers are caught reliably; cut any stray “um”s by hand (chapter 9).
How to auto-enhance the picture
What it does: applies a tasteful baseline polish — gentle exposure, contrast, saturation, shadow, and sharpness lift.
- AI Tools → Auto-enhance → Apply.
- Fine-tune (or undo) in the Effects tab’s Color Board (chapter 17).
Use it on flat-looking screen or webcam footage that needs a quick lift before you ship.
How to undo a cleanup
If a cleanup cut too aggressively, press ↶ Undo or ⌘Z to restore the previous state cleanly. Cleanups are normal undoable edits — try a different combination (silences only, or fillers only).
What order to run cleanup, and why it matters
- Remove Silences — tighten the timing.
- Remove Fillers — sharpen the delivery.
- Auto-Zoom (12), Generate Captions (11), Polish Voice + Auto-Duck (14).
- Auto-enhance — final picture lift.
Impact for creators and editors: silence + filler removal is the single biggest “feels professional” win for spoken content. It cuts watch-time-killing dead air, raises words-per-minute, and removes the “amateur” tells — often turning a 6-minute ramble into a tight 4-minute video that holds the audience. And because it’s one click and fully undoable, it replaces what used to be 30 minutes of manual scrubbing.
AI cleanup FAQ
How do I automatically cut dead air from a video? Open AI Tools → Remove Silences → Apply. It ripple-deletes every silent gap.
Does removing silences cut off the ends of words? No — cuts respect word boundaries and snap to a frame grid.
Why aren’t my “um”s being removed? Apple’s speech engine often doesn’t transcribe pure “um/uh.” Remove them manually, or rely on filler removal for “like/you know/so.”
Can I undo if it cuts too much? Yes — ⌘Z restores the prior state. Run the tools individually to control aggressiveness.
Will my captions stay aligned after I remove silences? Yes — captions and other timed elements ripple with the timeline and stay on the right words (chapter 11).
Pro tips & gotchas
- Always Preview before Apply — tune the silence threshold and padding so you don’t clip the start of words.
- Filler removal has hard/soft tiers — start conservative and re-run if you want a tighter cut.
- Run cleanup before captions so the transcript matches the final timing.
- TTS-generated “fillers” don’t always transcribe as fillers — review the preview for those edge cases.
Related: Captions → · Auto-Zoom & keyframes → · Polish Voice & Auto-Duck → · Manual timeline editing →