Improve AI using data from people first

Built and maintained by contributors, Mozilla Data Collective is a new platform for real-world data sharing that houses multilingual, multimodal datasets in more than 300 languages.

For ASR, TTS, Translation, and SLM, they publish exclusive, permissively licensed datasets that may be accessed through the datacollective Python package.

This week's new releases consist of:

Text-to-speech:

TTS corpus in Bulgarian Code-switching:

Nahuatl dialogues with code-switching annotations Youth speech:

Indonesian youth speech audio corpus On Mozilla Data Collective, find exclusive public datasets.

Your one-stop shop for automation insights and news on artificial intelligence is EngineAi.
Did you like this article? Check out more of our knowledgeable resources:
📰 In-depth analysis and up-to-date AI news .
🤝 Visit to learn about our goal and knowledgeable staff.
📬 Use this link to share your project or schedule a free consultation.
Watch this space for weekly updates on digital transformation, process automation, and machine learning. Let us assist you in bringing the future into your company right now.

Improve AI using data from people first

📚 You might also like

Supreme Court Sidesteps AI Copyright Question: What Thaler v. Copyright Office Means for the Future of Creative IP

OAI gets a Pentagon contract while Trump fires Anthropic

The frontier of OpenAI for handling AI coworkers