Skip to content

Excited about the future of AI hardware? Apply to be a part of the Tenstorrent team >

Systems

Tenstorrent’s pre-configured, rack-mounted Galaxy systems are engineered to deliver dense, scalable, high-performance AI compute built on an Ethernet-based mesh of 32 Tenstorrent Wormhole processors.

HIGH DENSITY, SCALABLE COMPUTE

Galaxy leverages the on-chip high-bandwidth Ethernet and switch within the Wormhole processor, allowing users to arbitrarily scale computing resources without re-programming the model or infrastructure. Each chip has sixteen 200Gb Ethernet ports around its edge (totaling 3.2Tb of chip-to-chip bandwidth), allowing for the extension of our Network-on-Chip to as many compute nodes as required. Tenstorrent’s TT-Buda™ SDK automatically recognizes these additional devices to take full advantage of the available resources.

  • Form Factor
  • AI Processor(s)
  • Galaxy Modules
  • Tensix Cores
  • TeraFLOPs (FP8)
  • SRAM
  • Memory
  • Power
  • System Interface
Galaxy Module
  • Tenstorrent Wormhole
  • 1
  • 80
  • 292
  • 120MB (1.5MB per Tensix Core)
  • 12GB GDDR6 (192-bit memory bus, 12 GT/sec)
  • 200W
  • 3.2 Tbps Ethernet (16 x 200Gbps)
Galaxy Server
  • 4U
  • 32x Tenstorrent Wormhole
  • 32
  • 2,560
  • 9,322 (9.3 PetaFLOPs)
  • 3.8GB (120MB per Module)
  • 384GB GDDR6, globally addressable
  • 7.5kW
  • 41.6Tbps Ethernet Internal Connectivity
Full Galaxy Server Rack
  • 48U
  • 256x Tenstorrent Wormhole
  • 256
  • 20,480
  • 74,576 (74.6 PetaFLOPs)
  • 30.7GB (3.8GB per Server)
  • 3TB GDDR6, globally addressable
  • 60kW
  • I/O up to 76.8 Tbps
SUPPORTED DATATYPES

FP8, FP16, FP32

BFP2, BFP4, BFP8

INT8, INT16, INT32

UINT8

TF32

EASE OF CODE AND
APPLICATION PORTABILITY

Tenstorrent’s TT-Buda™ SDK enables users to compile code from common ML frameworks like PyTorch or TensorFlow directly and abstracts the underlying hardware, speeding implementation of existing models. Native support for the onboard Ethernet of the Wormhole chips means adding additional compute is as easy as installing another device, without special networking or configuration required.

Users who want to get as close to the silicon as possible will appreciate the open-source TT-Metalium SDK which provides low-level hardware access and enables use of Python and C++ for AI and non-AI workloads alike.

Looking for additional information & specs?