English Speech Dataset For Speech and Voice Recognition Models

We provide English Speech Dataset for training and testing English speech/voice recognition algorithms and ASR models. Our transcribed NLP Dataset is perfect for speech-to-text and ASR models for English language.

We have multiple datasets that you can choose from: transcribed spontaneous speech data with one or two people speaking or scripted monologues.

BACK TO DATASETS

Datasets for your speech recognition solution in English

Improve your English automatic speech recognition models or deploy new models in days using our speech and voice recognition dataset. The English datasets you can choose from are scripted and non scripted recordings with one or two people speaking. Tell us what data you need and we will include only the data that fits your use case and needs, whether that is specific background noise levels, speakers from certain regions, speakers of specific age groups, gender, or nativitiy.

We can provide you with thousands of hours of speech recorded by tens of thousands unique speakers. With our high-quality training datasets, you can gain competitive advantage over your competitors, reduce time to market, and improve word error rate of your models.

Regions

Our speech recognition datasets in English consists of native and non-native speakers from the following regions:
English language: native US, UK, CA, and non-native.

Speech recognition data specifications

The English The datasets contain transcribed and segmented audio clips of people talking about various topics or reading sentences, with up to two hours of speech per person. The speech is captured using mobile phones and laptops from a diverse crowd of speakers representing all ages and backgrounds. Because of that, the dataset is perfect for ASR and voice assistant use cases using mobile devices.

Recordings vary in length depending on type of recording. Scripted speech recordings are up to 30 seconds while two people conversations are of up to one hour long. The recordings are transcribed and segmented by speaker, noise, music, and overlapping speech.

Automatic speech recognition (ASR) is also known as speech-to-text and voice recognition.

What use cases is the data for?

The speech recognition datasets are perfect for:
- Building a speech recognition AI.
- Building a speaker recognition AI.
- Speech recognition solutions for call centers.

Dataset license

Our data licenses agreement covers commercial use, and the datasets can be reused for multiple cases. However, they are not for reselling.

Sample

Recording of person requesting information sent to email.

How is data relevant to speech recognition?

Quality guarantee

We are confident in our data, and all customers can review a sample batch of data before buying. Additionally, we offer a quality guarantee. If you wish to review more samples before buy, state so when filling in the order form.

English speech data starting from

218€ / hour

Order now

Details

Technical

SAMPLING RATE

16 – 44 kHz

BACKGROUND NOISE

Classified by noise level

RECORDINGS

Depends on case, up to 1 hour long.

FILE FORMAT

.wav

TRANSCRIPTS

Verbatim and/or read from sentences

Demographics

AGE RANGE

16 – 85 years

GENDER

Female 40%, Male 60%

PROFICIENCY

Grouped by native and non-native English speakers

REGION

Grouped by country of origin and region within the country

Order now

“Partnering with StageZero has been vital in providing us with high-quality utterance corpora for training our proprietary language and semantic models.”

Dr. Christoph Neumann
CTO at German Autolabs

Custom training data collection for speech and NLP.

Didn’t find the speech dataset you need or your industry in our marketplace? Get in touch with us, so we can use our global network to source the training or testing data that fits your needs.

TELL US ABOUT THE DATA THAT YOU NEED