Descript - From Visual Storytelling to Storytelling with Visuals
Descript flips video editing: start with your words, not your footage. Here’s how it changed the way I create—and why I’m not going back.

My First Experiences with Descript
I'm genuinely excited about Descript. Why? Because it flips the entire logic of video editing. Traditional tools like iMovie or Premiere are built around visual storytelling: image comes first, speech follows. Descript turns that upside down—letting you start with what you say, and then shape how it looks.
It enables storytelling with visuals, not visual storytelling. And that change is huge.
AI as the bridge: from soundwaves to text
The magic lies in how Descript uses AI. The moment you record something—an interview, a voiceover, a live session—it transcribes the audio into text. Instantly.
That transcription becomes your timeline. You’re no longer scrubbing through clips or guessing where something happened. You edit your video by editing text.
That’s the real power: AI turns speech into editable material. Not just for convenience, but to unlock a completely different workflow.
Here I am at work:

Here is the result:
From struggling with iMovie to freedom through text
I’ve spent years wrestling with tools like iMovie. Everything revolves around image—timing, layers, visual expression—and I simply don’t speak that language. I don’t know the lingo, and I don’t always understand the expressive codes.
And although I enjoy explaining things and doing interviews, anything over a minute became difficult to produce in one go. What I really wanted was to work with text and video—on equal footing.

Editing interviews like editing text
Descript makes that possible. Record a video interview—via Zoom, Teams, Meet or otherwise—and you instantly have a transcript. From there, you can cut, rearrange, and refine just like editing a document.
You can break things into scenes, add titles, remove filler words or silences, or even use AI to generate missing bits in your own voice.
Yes, even dubbing and translation are possible—with your own voice, in another language.
Image as a second language
I create my title cards in Canva and import them into Descript. That feels natural. I still work from the story, from the words. Only after that comes the layout—picture-in-picture, subtitles, background footage, or stock visuals from Descript’s own library.
What’s crucial: I don’t need to learn how to think in visuals. I can stay rooted in what I know—speaking, writing, reading—and still create something that feels rich and polished.
Direction starts with speech
What I’ve noticed already is that I think differently while recording. Because I know I’ll be editing the text later, I start making intuitive decisions as I speak.
I feel more like a director—even while simply doing an interview. The editing process starts with the words, not the footage. And that’s powerful.

One caveat: foreign languages
One limitation I’ve run into is when working in a language other than English—in my case, Dutch. Transcription isn’t always accurate, and dubbing sometimes misses the mark.
It works, but don’t expect the same polish or precision you’d get with English. Worth keeping in mind if you’re planning multilingual work.
Final thoughts
Descript is more than a video editor. It’s a rethink of the creative process. It empowers people like me—who work through language, not images—to become visual creators without needing to become visual creators.
And AI is what makes that possible, by turning soundwaves into text, and text into a timeline.
I still have a lot to learn, but one thing is clear: I’ve found a new language for storytelling. And it starts with words. Will keep you updated of the progress.