Tenstorrent’s pre-configured, rack-mounted Galaxy systems are engineered to deliver dense, scalable, high-performance AI compute built on an Ethernet-based mesh of 32 Tenstorrent Wormhole processors.
Galaxy leverages the on-chip high-bandwidth Ethernet and switch within the Wormhole processor, allowing users to arbitrarily scale computing resources without re-programming the model or infrastructure. Each chip has sixteen 200Gb Ethernet ports around its edge (totaling 3.2Tb of chip-to-chip bandwidth), allowing for the extension of our Network-on-Chip to as many compute nodes as required. Tenstorrent’s TT-Buda SDK automatically recognizes these additional devices to take full advantage of the available resources.
FP8, FP16, FP32
BFP2, BFP4, BFP8
INT8, INT16, INT32
UINT8
TF32
APPLICATION PORTABILITY
Tenstorrent’s TT-Buda SDK enables users to compile code from common ML frameworks like PyTorch or TensorFlow directly and abstracts the underlying hardware, speeding implementation of existing models. Native support for the onboard Ethernet of the Wormhole chips means adding additional compute is as easy as installing another device, without special networking or configuration required.
Users who want to get as close to the silicon as possible will appreciate the open-source TT-Metalium SDK which provides low-level hardware access and enables use of Python and C++ for AI and non-AI workloads alike.
Looking for additional information & specs?