This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Tenstorrent’s pre-configured, rack-mounted Galaxy systems are engineered to deliver dense, scalable, high-performance AI compute built on an Ethernet-based mesh of 32 Tenstorrent “Wormhole” processors.


Galaxy leverages the on-chip high-bandwidth Ethernet and switch within the “Wormhole” processor, allowing users to arbitrarily scale computing resources without re-programming the model or infrastructure. Each chip has sixteen 200Gb Ethernet ports around its edge (totaling 3.2Tb of chip-to-chip bandwidth), allowing for the extension of our Network-on-Chip to as many compute nodes as required. Tenstorrent’s Buda SDK automatically recognizes these additional devices to take full advantage of the available resources.
FP8, FP16, FP32
BFP2, BFP4, BFP8
INT8, INT16, INT32
UINT8
TF32
APPLICATION PORTABILITY
Tenstorrent’s TT-Buda SDK enables users to compile code from common ML frameworks like PyTorch or TensorFlow directly and abstracts the underlying hardware, speeding implementation of existing models. Native support for the onboard Ethernet of the “Wormhole” chips means adding additional compute is as easy as installing another device, without special networking or configuration required.
Users who want to get as close to the silicon as possible will appreciate the open-source TT-Metal SDK which provides low-level hardware access and enables use of Python and C++ for AI and non-AI workloads alike.
Want to learn more about Galaxy?