Improve your English automatic speech recognition models or deploy new models in days using our speech and voice recognition dataset. Our English speech to text data is a combination of scripted and non scripted recordings. You can customize the data so that it fit your needs whether that is background noise, speakers from certain regions, age groups, or gender.
The dataset consists of hundreds of hours of speech captured from tens of thousands unique speakers. With our high-quality training datasets, you can gain competitive advantage over your competitors, reduce time to market, and improve your models.
Our ASR datasets for English consists of native and non-native speakers. Additionally, the recordings are validated and transcribed by other humans.
English language: native US, UK, CA, and non-native.
The English datasets contain transcribed audio clips of people talking about various topics or reading sentences, up to 15 minutes of speech per person. The speech is captured using mobile phones from a diverse crowd of speakers representing all ages and backgrounds. Because of that, the dataset is perfect for ASR and voice assistant use cases using mobile devices.
Recordings vary in length up to 60 seconds each and are classified by multiple types, such as background noise level, age groups, gender, and region. The recordings are transcribed verbatim with speech transcribed as said by the person.
Automatic speech recognition (ASR) is also known as speech-to-text and voice recognition.