Enhancing Multimodal Training and Memory Efficiency with DeepSpeed Blog Enhancing Multimodal Training and Memory Efficiency with DeepSpeed Overview This blog walks through two crucial DeepSpeed updates: (1) a PyTorch-identical backward API that…Masahiro Tanaka (Anyscale) and Olatunji Ruwase (Snowflake)February 24, 2026
Accelerating Autotuning in Helion with Bayesian Optimization Blog Accelerating Autotuning in Helion with Bayesian Optimization Introduction As introduced in a previous blog post, Helion is a high-level DSL that empowers…Ethan Che, Oguz Ulgen, Max Balandat, Jongsok Choi, Jason AnselFebruary 24, 2026
PyTorch Foundation Announces New Members as Agentic AI Demand Grows AnnouncementsBlog PyTorch Foundation Announces New Members as Agentic AI Demand Grows Foundation welcomes Clockwork.io, Emmi AI, NIPA, Nota AI., Yasp, CommonAI CIC, Carnegie Mellon University, Monash…PyTorch FoundationFebruary 24, 2026
PyTorchCon Europe Schedule is Live AnnouncementsBlog PyTorchCon Europe Schedule is Live The schedule for PyTorch Conference Europe is officially live! Join us 7-8 April in Paris…PyTorch FoundationFebruary 23, 2026
Pyrefly Now Type Checks PyTorch Blog Pyrefly Now Type Checks PyTorch We’re excited to share that PyTorch now leverages Pyrefly to power type checking across our…PyTorch and Pyrefly Teams at MetaFebruary 12, 2026
Why I’m Joining the PyTorch Foundation AnnouncementsBlog Why I’m Joining the PyTorch Foundation I want to start by thanking Matt White for everything he has built over the…Mark Collier, Executive Director, PyTorch FoundationFebruary 11, 2026
PyTorch Foundation: The Next Chapter, Together AnnouncementsBlog PyTorch Foundation: The Next Chapter, Together Over the past nearly two years, I’ve had the privilege of serving as Executive Director…Matt WhiteFebruary 11, 2026
Accelerating Mamba2 with Kernel Fusion Blog Accelerating Mamba2 with Kernel Fusion Summary In this post, we discuss how we optimized the Mamba-2 State-Space Dual (SSD) module…Rishi Astra, Tri Dao, Adnan HoqueFebruary 6, 2026
Some Matrix Multiplication Engines Are Not As Accurate As We Thought Blog Some Matrix Multiplication Engines Are Not As Accurate As We Thought What is an accumulator in an accelerator's GEMM engine and why does it matter? GPUs…Chi-Chun (Charlie) Liu, Monodeep Kar, Naigang Wang, Raghu Kiran Ganti, Mudhakar SrivatsaFebruary 6, 2026
Building Highly Efficient Inference System for Recommenders Using PyTorch Blog Building Highly Efficient Inference System for Recommenders Using PyTorch Why Choose PyTorch for Recommendation System PyTorch has emerged as the de facto framework in…Lu Fang, Shiyan Deng, Hongyi Jia, Huamin Li, Ilina Mitra, Sheng Qin, Zhengkai Zhang, Zhuoran Zhao, Zinnia ZhengFebruary 5, 2026
Portable Paged Attention in Helion Blog Portable Paged Attention in Helion Recently, the PyTorch team released Helion, a new domain-specific and PyTorch-based language to make the…Burkhard Ringlein (IBM Research) and the vLLM Team at IBM ResearchFebruary 3, 2026
Unlock Reasoning in Llama 3.1-8B via Full Fine-Tuning on NVIDIA DGX Spark BlogCommunity Unlock Reasoning in Llama 3.1-8B via Full Fine-Tuning on NVIDIA DGX Spark What is the unsaid joy of local LLMs? The magic of downloading weights, running some…Sanyam Bhutani (PyTorch Meta), Hamid Shojanazeri (PyTorch Meta), Clement Anthonioz Blanc (Meta)February 2, 2026
Accelerating On-Device ML Inference with ExecuTorch and Arm SME2 Blog Accelerating On-Device ML Inference with ExecuTorch and Arm SME2 Interactive image segmentation has become a defining mobile experience across the world’s most popular apps.…Jason Zhu, Tyler Mullenbach, Damien Dooley, and Gian Marco Idoice, ArmJanuary 29, 2026
PyTorch 2.10 Release Blog Blog PyTorch 2.10 Release Blog We are excited to announce the release of PyTorch® 2.10 (release notes)! This release features…PyTorch FoundationJanuary 21, 2026
PyTorch Foundation in 2025: A Year in Review and the Road Ahead AnnouncementsBlog PyTorch Foundation in 2025: A Year in Review and the Road Ahead 2025 was a defining year for PyTorch Foundation. In May, we announced our expansion into…PyTorch FoundationJanuary 15, 2026
Supercharging LLMs: Scalable RL with torchforge and Weaver Blog Supercharging LLMs: Scalable RL with torchforge and Weaver Scaling reinforcement learning (RL) for post-training large language models (LLMs) is notoriously difficult. While running…Stanford - Jon Saad-Falcon, Hangoo Kang, Simon Guo, Aakanksha Chowdhery, Azalia Mirhoseini Meta - Allen Wang, Danning Xie, Evan Smothers, Felipe Mello, Jack Khuu, Jiyue Wang, Joe Cummings, Lucas Pasqualin, Philip Bontrager, Rithesh Baradi, Vidhya Venkat, Yuxuan Hu, Jafar Taghiyar, Davide Italiano, Gayathri Aiyer, John Myles White, Joe Spisak, Sanyam Bhutani, Hamid Shojanazeri, Matthias Reso Ali Sol Hossein Kavianihamedani Emre Guven CoreWeave - Deok Filho Aaron Batilo Matthew Guan Xi LuJanuary 9, 2026
Warp Specialization in Triton: Design and Roadmap Blog Warp Specialization in Triton: Design and Roadmap The Triton compiler aims to generate performance-portable code and runtime across hardware for AI kernels.…Manman Ren, Nick Riasanovsky, Neil Dhar, Hongtao Yu, Jie Liu, Partha Kanuparthy, Shane NayJanuary 8, 2026
PyTorch 2.9: FlexAttention Optimization Practice on Intel GPUs Blog PyTorch 2.9: FlexAttention Optimization Practice on Intel GPUs Overview The most recent LLM serving frameworks and models increasingly adopt attention variants, such as…Intel PyTorch and Triton teamJanuary 8, 2026
Deploying Smarter: Hardware-Software Co-design in PyTorch Blog Deploying Smarter: Hardware-Software Co-design in PyTorch If you want powerful on-device AI that doesn’t blow your memory budget or turn your…Kieran Hejmadi, ArmDecember 18, 2025
Enabling Cluster Launch Control with TLX Blog Enabling Cluster Launch Control with TLX What is cluster launch control (CLC)? Blackwell brings in cluster launch control (CLC) to enable…Daohang Shi, Hongtao Yu, Manman RenDecember 17, 2025