Training models at any scale can be quite daunting to newer practitioners. The following educational resources may be useful in learning about the considerations required for successfully and effectively training or fine-tuning foundation models.
![Educational Resources for Foundation Model Training](/foundation-model-resources/model-training-educational-resources/model-training-educational-resources_hu5c5ae6bf4fbdc7dc78ff54464d2ea1fb_79085_736x0_resize_q90_h2_lanczos_3.webp)
Training models at any scale can be quite daunting to newer practitioners. The following educational resources may be useful in learning about the considerations required for successfully and effectively training or fine-tuning foundation models.
A rundown and crash course in distributed training for deep learning, with an eye toward LLM finetuning and current useful tools and resources. Provides a good overview of the various (distributed) training strategies for efficient and scalable training.
An “online textbook” and resource collection on ML engineering at scale, ranging from debugging distributed systems, parallelism strategies, effective use of large HPC clusters, and chronicles of past large-scale training runs with lessons learned.
A minimal, stripped-down training codebase for teaching purposes and easily-hackable yet performant small-scale training.
A set of resources on how to train large scale AI systems
A blog post on the inference costs of transformer-based LMs. Useful for providing more insight into deep learning accelerators and inference-relevant decisions to make when training a model.
An introductory blog post on training costs of LLMs, going over useful formulas and considerations from a high to low level