Finetuning or adaptation of foundation models is a complex step in model development. These models are more frequently deployed than base models. Here, we link to some useful and widely-used resources for finetuning.
![Finetuning Data Catalogs for Foundation Models](/foundation-model-resources/finetuning-data-catalogs/finetuning-data-catalogs_hu006eabc3a48d560cf7382cc193979586_64823_736x0_resize_q90_h2_lanczos_3.webp)
Finetuning or adaptation of foundation models is a complex step in model development. These models are more frequently deployed than base models. Here, we link to some useful and widely-used resources for finetuning.
A repository of Indian language text and speech resources, including datasets.
A catalogue of hundreds of Arabic text and speech finetuning datasets, regularly updated.
Speaker Diarization dataset comprising over 50 hours of conversational speech recordings collected from twenty real dinner parties that have taken place in real homes
A repository and explorer tool for selecting popular finetuning, instruction, and alignment training datasets from Hugging Face, based on data provenance and characteristics criteria.
An online catalogue that provides links to African language resources (papers and datasets) in both texts and speech
A repository of African language text and speech resources, including datasets.
Speaker Identification dataset comprising of YouTube interviews from thousands of celebrities
Spoken language identification dataset created using audio extracted from YouTube videos retrieved using language-specific search phrases
An online catalogue that provides African language resources (data and models) in both texts and speech
A permissively licensed multilingual instruction finetuning dataset curated by the Aya Annotation Platform from Cohere For AI. The dataset contains a total of 204k human-annotated prompt-completion pairs along with the demographics data of the annotators, spanning 65 languages.