{dataset_language} voice assistant and voice command datasets

{dataset_language} training and testing datasets for voice assistants. Ready-made datasets available for wake words, skill commands, and voice commands for all major voice assistants. 

BACK TO DATASETS

{dataset_language} voice asisstant and voice commands dataset

Train, fine-tune, and test your voice assistant using voice data from thousands of people. By using our datasets you can improve the voice assistant to recognize native and non-native speakers and/or test that it works for different demographics and regions.

Access any of the following voice assistant activation data or voice commands for testing or training:
- Amazon Alexa dataset.
- Siri dataset.
- Google Assistant dataset.
- Cortana dataset.

The datasets consist of speech from thousands of {dataset_language} speakers using voice assistants.

Regions and language

{dataset_region}

{dataset_local_language}

Specifications

The dataset contains audio clips of people recording themselves speaking voice assistant commands and wake words, up to 10 minutes of speech per person. The speech is captured using mobile phones from a diverse crowd of speakers representing all ages and backgrounds. Because of that, the dataset is perfect for use cases involving mobile devices.

Recordings vary in length with an average of 3-second clips. Furthermore, they are classified by the background noise level, age group, gender, and region. The recordings, if spontaneously recorded, are transcribed verbatim with speech transcribed as said by the person.

{dataset_synonym}

Dataset license

Our data licenses agreement covers commercial use, and the datasets can be reused for multiple cases. However, they are not for resale.

Sample

{dataset_audioclip_text}

Quality guarantee

We are confident in the quality of our data, all customers can review a sample batch of data before buying. When filling in the order form, you can ask for samples. If you have special requirements for data, you can also mention that to speed up the process.

What is a voice assistant?

To find out more about voice assistants click here.

{dataset_language} voice assistant dataset starting from

{dataset_price_text}
Order now

Voice assistant dataset details

Technical

SAMPLING RATE
16 – 44 kHz
BACKGROUND NOISE
Classified by noise level
SPEECH TYPE
Wake words and skills commands for: Amazon Alexa, Siri, Google Assistant, Cortana
FILE FORMAT
.wav
RECORDINGS
3 seconds average

Demographics

AGE RANGE
16 – 85 years
GENDER
Female 40%, Male 60%
PROFICIENCY
Grouped by native and non-native {dataset_language} speakers
REGION
Grouped by country of origin and region within country
“Partnering with StageZero has been vital in providing us with high-quality utterance corpora for training our proprietary language and semantic models.”
Dr. Christoph Neumann
CTO at German Autolabs

Custom training data collection for speech and NLP.

Didn’t find the speech dataset you need or your industry in our marketplace? Get in touch with us, so we can use our global network to source the training or testing data that fits your needs.
TELL US ABOUT YOUR NEEDS
©2022 StageZero Technologies
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram