LoRa
Efficient LLM Finetuning.
Read more ⟶
Flash Attention
Reduce the memory usage used to compute exact attention.
Read more ⟶
Multi & Grouped Query Attention
Use less K and V matrices to use less memory.
Read more ⟶