LoRa


Efficient LLM Finetuning.
Read more ⟶

Flash Attention


Reduce the memory usage used to compute exact attention.
Read more ⟶

Multi & Grouped Query Attention


Use less K and V matrices to use less memory.
Read more ⟶