Reduce size of ML models without loosing their performance and efficiency and improving inferences speed usng techniques such as Pruning, Quantization and Distillation.
Techniques to speed up PyTorch Neural Network models using `torch.compile()`, TorchDynamo, TorchInductor and CUDAGraphs.