Full list of 27 languages (as of early 2025):
English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Japanese, Korean, Mandarin Chinese, Cantonese, Arabic, Hindi, Swedish, Danish, Norwegian, Finnish, Polish, Turkish, Czech, Hungarian, Romanian, Croatian, Estonian, Latvian, Lithuanian, Serbian (Latin), Slovak, Slovenian.
Note: Arabic, Hindi, and Cantonese still cloud-only (no on-device). Latest Adobe Speech to Text v2.1.6 for Premiere...
While v2.1.6 comes pre-trained, it allows for user-side corrections to improve output. When an editor manually corrects a word that the AI misinterpreted, the system can learn from this correction, improving future transcriptions for that specific user session or project.
If you open Premiere Pro (version 24.x or later) and do not see the version 2.1.6 features, you must update manually: Full list of 27 languages (as of early
Pro Tip: After updating, go to Preferences > Audio and ensure "Dynamically transcribe sequences" is checked. This allows the AI to start transcribing while you edit, rather than waiting for you to open the Text panel.
While specific release notes or details about version 2.1.6 aren't provided here, updates to such features typically include: Note: Arabic, Hindi, and Cantonese still cloud-only (no
For users installing or updating to this specific version via Adobe Creative Cloud:
Adobe Speech to Text v2.1.6 isn’t a flashy AI gimmick—it’s a reliability update that solves real editing friction. The custom glossary alone saves hours of manual correction, and the streaming preview changes your subtitle workflow from “wait and fix” to “edit as you go.” For anyone delivering captioned content in 2026, this is the quiet upgrade that pays for itself in time saved.
Rating: 4.7/5
Best for: Editors producing more than 2 hours of dialogue-heavy content per week.
Skip if: You only cut music videos or purely visual sequences with no speech.
The output of Speech to Text is not just captions; it is metadata. Premiere Pro indexes the transcribed text, allowing editors to search for specific words spoken in the video. This turns hours of raw footage into a searchable database, making it effortless to locate specific soundbites without scrubbing through the timeline.