Tenstorrent’s pre-configured, rack-mounted Galaxy systems are engineered to deliver dense, scalable, high-performance AI compute built on an Ethernet-based mesh of 32 Tenstorrent “Wormhole” processors.
Galaxy leverages the on-chip high-bandwidth Ethernet and switch within the “Wormhole” processor, allowing users to arbitrarily scale computing resources without re-programming the model or infrastructure. Each chip has sixteen 200Gb Ethernet ports around its edge (totaling 3.2Tb of chip-to-chip bandwidth), allowing for the extension of our Network-on-Chip to as many compute nodes as required. Tenstorrent’s Buda SDK automatically recognizes these additional devices to take full advantage of the available resources.
- Form Factor
- AI Processor(s)
- Galaxy Modules
- Tensix Cores
- TOPs (INT8)
- System Interface
- Tenstorrent Wormhole
- 120MB (1.5MB per Tensix Core)
- 12GB GDDR6 (192-bit memory bus, 12 GT/sec)
- 3.2 Tbps Ethernet (16 x 200Gbps)
- 32x Tenstorrent Wormhole
- 2,624 (2.6 PetaOPs)
- 3.8GB (120MB per Module)
- 384GB GDDR6, globally addressable
- 41.6Tbps Ethernet Internal Connectivity
- 256x Tenstorrent Wormhole
- 20,992 (20.9 PetaOPs)
- 30.7GB(3.8GB per Server)
- 3TB GDDR6, globally addressable
- I/O up to 76.8 Tbps
FP8, FP16, FP32
BFP2, BFP4, BFP8
INT8, INT16, INT32
Tenstorrent’s TT-Buda SDK enables users to compile code from common ML frameworks like PyTorch or TensorFlow directly and abstracts the underlying hardware, speeding implementation of existing models. Native support for the onboard Ethernet of the “Wormhole” chips means adding additional compute is as easy as installing another device, without special networking or configuration required.
Users who want to get as close to the silicon as possible will appreciate the open-source TT-Metal SDK which provides low-level hardware access and enables use of Python and C++ for AI and non-AI workloads alike.
Want to learn more about Galaxy?