Voice assistant training datasets for German

Are you developing a voice assistant or solutions on top of an existing service? Our datasets are perfect for voice assistants in German. We have datasets available for wake words, skill commands, and voice commands. 

BACK TO DATASETS

German voice asisstant and voice commands dataset

Train, fine-tune, and test your voice assistant using voice data from thousands of people in German. By using our datasets you can improve the voice assistant to recognize native and non-native speakers and/or test that it works for different demographics and regions.

Access any of the following voice assistant activation data or voice commands for testing or training:
- Amazon Alexa dataset.
- Siri dataset.
- Google Assistant dataset.
- Cortana dataset.

The datasets consist of speech from thousands of German speakers using voice assistants.

Regions and language

The German data is recorded by people in the following regions: German language: native DE, BE, AT, and non-native.

Data Specifications

The datasets contains audio clips of people recording themselves speaking voice assistant commands and wake words, up to 10 minutes of speech per person. Wake words are the phrases that activate the voice assitant while voice commands are requests for the voice assitant to perform a certain action. 

Recordings vary in length with an average of 3-second clips.

Dataset license

Our data licenses agreement covers commercial use, and the datasets can be reused for multiple cases. However, they are not for resale.

Sample

Samples available upon request.

Quality guarantee

We are confident in the quality of our data, all customers can review a sample batch of data before buying. When filling in the order form, you can ask for samples. If you have special requirements for data, you can also mention that to speed up the process.

Why does voice assistants need data?

To find out more about voice assistants click here.

Voice assistant data for German starting from

0.99€ / recording
Order now

Voice assistant dataset details

Technical

SAMPLING RATE
16 – 44 kHz
BACKGROUND NOISE
Classified by noise level
SPEECH TYPE
Wake words and skills commands for: Amazon Alexa, Siri, Google Assistant, Cortana
FILE FORMAT
.wav
RECORDINGS
3 seconds average

Demographics

AGE RANGE
16 – 85 years
GENDER
Female 40%, Male 60%
PROFICIENCY
Grouped by native and non-native German speakers
REGION
Grouped by country of origin and region within country
“Partnering with StageZero has been vital in providing us with high-quality utterance corpora for training our proprietary language and semantic models.”
Dr. Christoph Neumann
CTO at German Autolabs

Custom training data collection for speech and NLP.

Didn’t find the speech dataset you need or your industry in our marketplace? Get in touch with us, so we can use our global network to source the training or testing data that fits your needs.
TELL US ABOUT YOUR NEEDS
Palkkatilanportti 1, 4th floor, 00240 Helsinki, Finland
info@stagezero.ai
2733057-9
©2022 StageZero Technologies
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram