June 20, 2024
🎉 PyTorch Docathon H1 2024 Wrap-up 🎉
We are thrilled to announce the successful completion of the H1 2024 PyTorch Docathon! The event was a resounding success, and we want to extend our heartfelt gratitude to all the participants who made it possible. Dedication, expertise, and tireless efforts of our open-source contributors have once again helped us to improve PyTorch documentation.
June 20, 2024
Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity
Over the past year, we’ve added support for semi-structured (2:4) sparsity into PyTorch. With just a few lines of code, we were able to show a 10% end-to-end inference speedup on segment-anything by replacing dense matrix multiplications with sparse matrix multiplications.
June 12, 2024
Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing
Summary: With PyTorch distributed’s new asynchronous checkpointing feature, developed with feedback from IBM, we show how IBM Research Team is able to implement and reduce effective checkpointing time by a factor of 10-20x. Example: 7B model ‘down time’ for a checkpoint goes from an average of 148.8 seconds to 6.3 seconds, or 23.62x faster.
June 11, 2024
PyTorch Foundation Welcomes New Executive Director
The PyTorch Foundation is excited to welcome Matt White, our new executive director. The PyTorch Foundation formed in 2022 with the goal to drive adoption of AI tooling by fostering and sustaining an ecosystem of open source, vendor-neutral projects with PyTorch. Over the past 2 years, we’ve seen excellent growth across the project – with both contributor and member growth.
June 06, 2024
INT4 Decoding GQA CUDA Optimizations for LLM Inference
An efficient decoding Grouped-Query Attention with low-precision KV cache
June 04, 2024
Ready, Set, Contribute: PyTorch Docathon Kickoff H1 2024
The PyTorch Docathon is now live! This event is dedicated to enhancing the quality of the PyTorch documentation with the invaluable assistance of our community. Our hope with this Docathon is to simplify the process for new users to get started with PyTorch, guide them in effectively utilizing its features, and ultimately expedite the transition from research to production in machine learning.
May 21, 2024
Maximizing Training Throughput Using PyTorch FSDP and Torch.compile
Recently, we demonstrated how FSDP and selective activation checkpointing can be used to achieve 57% MFU (Model Flops Utilization) for training a 7B model on A100 GPUs. We also demonstrated how it can train a high quality model, which we open sourced as Granite 7B base model on Hugging Face Hub under the Apache v2.0 license.