Venkata Ravuri

ML Model Compression - A practicle guide to Pruning, Quantization and Distillation

Reduce size of ML models without loosing their performance and efficiency and improving inferences speed usng techniques such as Pruning, Quantization and Distillation.

Speed up PyTorch ML Models

Techniques to speed up PyTorch Neural Network models using `torch.compile()`, TorchDynamo, TorchInductor and CUDAGraphs.