So I’ve tested AWS Transcribe to create automatic subtitling of a video.
Setup complexity:
This is super easy, basically create a single S3 bucket and voilĂ .
For the permission, you will need :
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "allowStartTranscriptionJob",
"Effect": "Allow",
"Action": [
"transcribe:StartTranscriptionJob",
],
"Resource": "*",
"Condition": {
"StringEquals": {
"transcribe:Outputkey": "DOC-EXAMPLE-BUCKET/prefix"
}
}
},
{
"Sid": "allowS3",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*transcribe*/"
}
]
}
Start a transcription job:
This is really super easy, as show in this lambda:
# Get transcribe client
transcribe_client = boto3.client("transcribe")
# Start a job
transcribe_job = transcribe_client.start_transcription_job(
# Important, the key must match "^[A-Za-z0-9_.-]+" regex
# Must be unique across the account
TranscriptionJobName=event.object_key.split("/")[-1],
LanguageCode="en-AU", # You can use IdentifyLanguage=True instead
MediaFormat=media_format, # Not required
Media={
"MediaFileUri": f"s3://{event.bucket_name}/{event.object_key}",
},
OutputBucketName=event.bucket_name,
OutputKey=output_key, # Prefix for output files
Settings={
"ShowSpeakerLabels": True,
"MaxSpeakerLabels": 5, # Mandatory if ShowSpeakerLabels is True
"ShowAlternatives": False,
# "MaxAlternatives": 3, # Mandatory id ShowAlternatives is True
},
Subtitles={
"Formats": [output_format], # list of strings in "srt", "vtt"
},
)
Full updated API details are in AWS documentation.
Get the subtitle files:
Once the job complete, the subtitles files are available in the output bucket.
Not that there are other options, for redact PIIs, and provide custom vocabularies and model to refine the transcriptions you create.
Performances:
I got to admit that is performs quite well, I would say like 90% hit when you speak load and clear. With some issues with very short words, and onomatopoeia.
Model Improvement:
I didn’t dive real deep into this, but for what I saw we have 2 features:
Custom Vocabulary
This is the easiest to setup, basically you provide the API with phrase for a specific language or provide tables with phrase/sounds matching.
Custom Model
This take more efforts to create, to make it simple you will have pro provide audio + transcript to either train a new model or tune an existing model.
Use cases identified
So I was more focused on how this can help on the video subtitle topic, and even if it’s not perfect, I really think I’ll use it for the video projects I have, it’s important to me to provide subtitles.
I don’t see me going to the hassle of training or tuning a custom model, but the custom vocabularies looks quite easy to setup and quite useful.
I’ve set a small pipeline with CDK that takes files dropped in a S3 bucket and create subtitles.
The big help it will provide is getting mostly accurate subtitles, with good timing, then I will only have to tune in a bit.
Cost
That’s cheap, here is a quick idea for no custom model, assuming 4h of video/week, and being really irresponsible with the storage and setting for 50GB of video stored.
Item name | Quantity | Currency | Cost (no VAT) |
S3 Storage | 50GB | USD | 1.20 |
Transcribe Jobs | 4h | USD | 5.76 |
Lambdas | <1000000 | – | – |
Notifications | <1000000 | – | – |
Total | USD | 6.96 |
0 Replies to “Feature Review: AWS Transcribe”