Amazon Transcribe builds search capability for audio and video
Most organizations worry about their capacity to produce audio and video content, but our client had the opposite problem. They had more than they could handle, and it was always a struggle to find what they needed.
Then iSoftStone recommended integrating Amazon Transcribe with their Nuxeo digital asset management platform (Nuxeo DAM).
An Amazon Transcribe integration doesn’t just improve basic search; it automatically turns speech into metadata, giving teams fuller access to their video and audio content and allowing them to do more with it.
James van Leuven
Sr. Director of AI-powered Enterprise Content Solutions | iSoftStone
Manual transcripts and limited metadata
The organization, a globally recognized nonprofit, manages a rich, and ever-growing, archive of intellectual content in their Nuxeo DAM — keynote speeches, panel discussions, lectures, community events, and more.
Their marketing and publishing teams needed to pinpoint and extract ideas and quotes from inside these thousands of hours of recorded video and audio. However, as the archive grew year over year, locating specific moments had become increasingly difficult.
To avoid relying on solely institutional memory, staff would sometimes manually attach speaker notes and scripts or type transcripts in Microsoft Word and upload them to the DAM.
The intent behind was good: make the content more findable. But in practice it meant:
- Hours spent searching for clips
- Inconsistent metadata
- Limited reuse of media
- Lost licensing opportunities
Moreover, the transcripts weren’t search indexed so the videos in the DAM still lacked metadata.
When they reached out to iSoftStone, we were able to suggest an elegant, AI-powered solution.
Integrating Amazon Transcribe with Nuxeo DAM
iSoftStone integrated Amazon Transcribe with the organization’s Nuxeo DAM, using the Nuxeo AI package.
For DAMs like Nuxeo platform that don’t have native API for transcription, Amazon Transcribe is a logical choice. It uses automatic speech recognition (ASR) to convert spoken language into text suitable for search and discovery and supports multiple languages.
Significantly, for our client, Amazon Transcribe’s functionality meant no more hours of work typing into a Word doc, incomplete metadata, or unsearchable content.
The new workflow was smooth and automated:
-
- Upload video into Nuxeo DAM
- Transcribe speech and attach to media asset
- Metadata enrich media asset
- Index spoken words into search
With time-coded manuscripts generated by Amazon Transcribe on ingestion and then indexed and stored in the DAM, spoken word content in videos and audio files became search optimized and fully query-able. Users could surface the original audio and video assets by topic, speaker, keyword, or even a half-remembered quote.
The positive impact of spoken words as searchable data
“We didn’t just make the videos searchable. We made years of knowledge findable and usable,” says James van Leuven, iSoftStone’s senior director of AI-powered enterprise content solutions.
The organization’s marketing and licensing teams can now:
- Find a quote inside a 60-minute keynote or a longer multi-person discussion
- Identify every time a topic was discussed across years of recordings
- Locate an individual’s entire spoken archive
- Extract clips for marketing and licensing
- Reuse existing intellectual property
Licensing and marketing teams can find exactly what they need without watching entire recordings, which has dramatically shortened the time required to prepare content for publication or re-distribution.
And, since metadata is now generated systematically, there’s less need for manual tagging and more consistency.
Our client has a truly searchable and comprehensive record of the content it most wants to preserve and use. All those working hours spent uploading documents, creating transcriptions by hand, and struggling to find the right clip? Reclaimed and used for higher value work.



