Romanian speech dataset: voice data transcribed for ASR

Romanian speech dataset for ASR. The most diverse speech datasets collected from thousands of people, perfect for speech recognition.
BACK TO DATASETS

Romanian speech dataset: voice data transcribed for ASR

Improve your Romanian automatic speech recognition models or deploy new models in days using our speech and voice recognition dataset. Our Romanian speech to text data is a combination of scripted and non scripted recordings. You can customize the data so that it fit your needs whether that is background noise, speakers from certain regions, age groups, or gender.

The dataset consists of hundreds of hours of speech captured from tens of thousands unique speakers. With our high-quality training datasets, you can gain competitive advantage over your competitors, reduce time to market, and improve your models.

Regions

Our ASR datasets for Romanian consists of native and non-native speakers. Additionally, the recordings are validated and transcribed by other humans.

Romanian language: native and non-native RO.


Speech to text data specifications

The Romanian datasets contain transcribed audio clips of people talking about various topics or reading sentences, up to 15 minutes of speech per person. The speech is captured using mobile phones from a diverse crowd of speakers representing all ages and backgrounds. Because of that, the dataset is perfect for ASR and voice assistant use cases using mobile devices.

Recordings vary in length up to 60 seconds each and are classified by multiple types, such as background noise level, age groups, gender, and region. The recordings are transcribed verbatim with speech transcribed as said by the person.

Automatic speech recognition (ASR) is also known as speech-to-text and voice recognition.

Dataset license

Our data licenses agreement covers commercial use, and the datasets can be reused for multiple cases. However, they are not for reselling.

Sample

Speech sample of Samples available upon request..

What is speech recognition?

Read more about speech recognition here.

Quality guarantee

We are confident in our data, and all customers can review a sample batch of data before buying. Additionally, we offer a quality guarantee. If you wish to review more samples before buy, state so when filling in the order form.

Romanian speech data starting from

218€ / hour
Order now

Details

Technical

SAMPLING RATE
16 – 44 kHz
BACKGROUND NOISE
Classified by noise level
RECORDINGS
1 - 60 seconds long
FILE FORMAT
.wav
TRANSCRIPTS
Verbatim and/or read from sentences

Demographics

AGE RANGE
16 – 85 years
GENDER
Female 40%, Male 60%
PROFICIENCY
Grouped by native and non-native Romanian speakers
REGION
Grouped by country of origin and region within the country
“Partnering with StageZero has been vital in providing us with high-quality utterance corpora for training our proprietary language and semantic models.”
Dr. Christoph Neumann
CTO at German Autolabs

Custom training data collection for speech and NLP.

Didn’t find the speech dataset you need or your industry in our marketplace? Get in touch with us, so we can use our global network to source the training or testing data that fits your needs.
TELL US ABOUT THE DATA THAT YOU NEED
©2022 StageZero Technologies
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram