6 Educational Resources for Foundation Model Training

Additional Educational Resources

Text 6 Speech 2 Vision 2

Everything about Distributed Training and Efficient Finetuning
A rundown and crash course in distributed training for deep learning, with an eye toward LLM finetuning and current useful tools and resources. Provides a good overview of the various (distributed) training strategies for efficient and scalable training.
- Website
Text
Machine Learning Engineering Online Book
An “online textbook” and resource collection on ML engineering at scale, ranging from debugging distributed systems, parallelism strategies, effective use of large HPC clusters, and chronicles of past large-scale training runs with lessons learned.
- GitHub
Text Speech Vision
nanoGPT
A minimal, stripped-down training codebase for teaching purposes and easily-hackable yet performant small-scale training.
- GitHub
Text
The EleutherAI Model Training Cookbook
A set of resources on how to train large scale AI systems
- GitHub
Text Speech Vision
Transformer Inference Arithmetic
A blog post on the inference costs of transformer-based LMs. Useful for providing more insight into deep learning accelerators and inference-relevant decisions to make when training a model.
- Website
Text
Transformer Math 101
An introductory blog post on training costs of LLMs, going over useful formulas and considerations from a high to low level
- Website
Text