Blog

Blog

PyTorch 2.3 Release Blog

We are excited to announce the release of PyTorch® 2.3 (release note)! PyTorch 2.3 offers…

PyTorch FoundationApril 24, 2024

Blog

Accelerating MoE model inference with Locality-Aware Kernel Design

1.0 Summary We show that by implementing column-major scheduling to improve data locality, we can…

Adnan Hoque, Less Wright, Antoni Virós Martin, Chih-Chieh YangApril 4, 2024

Blog

Maximizing training throughput using PyTorch FSDP

In this blog, we demonstrate the scalability of FSDP with a pre-training exemplar, a 7B…

Team PyTorch at IBM and Team PyTorch at MetaMarch 13, 2024

Blog

Accelerating Generative AI with PyTorch IV: Seamless M4T, fast

This post is the fourth part of a multi-series blog focused on how to accelerate…

Yejin Lee, Carole-Jean Wu, Christian Puhrsch, Joel Schlosser, Driss Guessous, Jeffrey Wan, Joe Isaacson, Can Balioglu, Juan PinoJanuary 23, 2024

Blog

Accelerate PyTorch Models Using Quantization Techniques with Intel Extension for PyTorch

Overview PyTorch is a Python-based framework for developing deep learning models. It is one of…

IntelJanuary 18, 2024

Blog

Accelerating Triton Dequantization Kernels for GPTQ

TL;DR Leveraging a first principles approach, we showcase a step by step process undertaken to…

Less Wright, Adnan Hoque (IBM)January 16, 2024

Blog

Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem

We demonstrate how to finetune a 7B parameter model on a typical consumer GPU (NVIDIA…

Younes Belkada, Marc Sun, Titus von Köller, Sourab Mangrulkar, Benjamin Bossan, Lysandre Debut, Steven LiuJanuary 10, 2024

Blog

Accelerate AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe, saving up to 75% on inference costs

Multi-model endpoints (MMEs) are a powerful feature of Amazon SageMaker designed to simplify the deployment and operation…

James Wu, Ankith Gunapal, Li Ning, Subhash Talluri, and Saurabh TrikandeJanuary 9, 2024

Blog

Accelerating Generative AI Part III: Diffusion, Fast

This post is the third part of a multi-series blog focused on how to accelerate…

Sayak Paul and Patrick von Platen (Hugging Face 🤗)January 3, 2024

Blog

Understanding GPU Memory 2: Finding and Removing Reference Cycles

This is part 2 of the Understanding GPU Memory blog series. Our first post Understanding GPU…

Aaron Shi, Zachary DeVitoDecember 19, 2023

Blog

Training Production AI Models with PyTorch 2.0

1. Introduction PyTorch 2.0 (abbreviated as PT2) can significantly improve the training and inference performance of…

CK Luk, Daohang Shi, Yuzhen Huang, Jackie (Jiaqi) Xu, Jade Nie, Zhou Wang, Lu Fang, Flavio Sales Truzzi, Devashish Shankar, Dima Ivashchenko, Chunzhi Yang, Nicolas Macchioni, David Berard, Yu Guo, Xiaodong Wang, Bert Maher, Yanbo Liang, Edward Yang, Brian Hirsh, Michael Voznesensky, Animesh Jain, Michael AndersonDecember 18, 2023

Blog

Empowering Models with Performance: The Art of Generalized Model Transformation Approach

Introduction PyTorch 2.0 (PT2) offers a compiled execution mode which rewrites Python bytecode to extract sequences…

Jackie (Jiaqi) Xu, Yanbo Liang, Jason Ansel, Chunzhi Yang, Jade Nie, Yuzhen Huang, CK Luk, Xiaodong Wang, Lu Fang, Menglu Yu, Jinwon Lee, Daohang Shi, Flavio Sales TruzziDecember 15, 2023

Blog

Understanding GPU Memory 1: Visualizing All Allocations over Time

During your time with PyTorch on GPUs, you may be familiar with this common error…

Aaron Shi, Zachary DeVitoDecember 14, 2023

Blog

From PyTorch Conference 2023: From Dinosaurs to Seismic Imaging with Intel

Lightning Talk 1: Seismic Data to Subsurface Models with OpenFWI Speaker: Benjamin Consolvo, AI Software…

Ramya Ravi, Susan Kahler at IntelDecember 12, 2023

Blog

Accelerating Generative AI with PyTorch II: GPT, Fast

This post is the second part of a multi-series blog focused on how to accelerate…

PyTorch FoundationNovember 30, 2023

Blog

PyTorch 2.1 Contains New Performance Features for AI Developers

We are excited to see the release of PyTorch 2.1. In this blog, we discuss…

IntelNovember 29, 2023

Blog

Accelerating Generative AI with PyTorch: Segment Anything, Fast

This post is the first part of a multi-series blog focused on how to accelerate…

PyTorch FoundationNovember 16, 2023

Blog

PyTorch compile to speed up inference on Llama 2

In this blog, we discuss how to improve the inference latencies of the Llama 2…

IBM Research: Antoni Viros i Martin, Brian Vaughan, Davis Wertheimer, Joshua Rosenkranz, Mudhakar Srivatsa, Nelson Mimura Gonzalez, Raghu Ganti, Supriyo Chakraborty, Zhuoran Liu Meta: Geeta Chauhan, Hamid ShojanazeriNovember 7, 2023

Blog

High-Performance Llama 2 Training and Inference with PyTorch/XLA on Cloud TPUs

In a landscape where AI innovation is accelerating at an unprecedented pace, Meta’s Llama family of open…

Jiewen Tan, Jon Bolin, Yeounoh Chung, Liyang Lu, Siyuan Liu, Wonjoo Lee, Manfei Bai, Meghan Cowan, Jack Cao, Milad Mohammadi, Shauheen Zahirazami, Alex SpiridonovNovember 6, 2023

Blog

Accelerating Inference on x86-64 Machines with oneDNN Graph

Supported in PyTorch 2.0 as a beta feature, oneDNN Graph leverages aggressive fusion patterns to…

IntelNovember 2, 2023

PyTorch 2.3 Release Blog

Accelerating MoE model inference with Locality-Aware Kernel Design

Maximizing training throughput using PyTorch FSDP

Accelerating Generative AI with PyTorch IV: Seamless M4T, fast

Accelerate PyTorch Models Using Quantization Techniques with Intel Extension for PyTorch

Accelerating Triton Dequantization Kernels for GPTQ

Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem

Accelerate AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe, saving up to 75% on inference costs

Accelerating Generative AI Part III: Diffusion, Fast

Understanding GPU Memory 2: Finding and Removing Reference Cycles

Training Production AI Models with PyTorch 2.0

Empowering Models with Performance: The Art of Generalized Model Transformation Approach

Understanding GPU Memory 1: Visualizing All Allocations over Time

From PyTorch Conference 2023: From Dinosaurs to Seismic Imaging with Intel

Accelerating Generative AI with PyTorch II: GPT, Fast

PyTorch 2.1 Contains New Performance Features for AI Developers

Accelerating Generative AI with PyTorch: Segment Anything, Fast

PyTorch compile to speed up inference on Llama 2

High-Performance Llama 2 Training and Inference with PyTorch/XLA on Cloud TPUs

Accelerating Inference on x86-64 Machines with oneDNN Graph

Docs

Tutorials

Resources

Stay in touch for updates, event info, and the latest news