Run Anything, Fast: Video, speech, code.

Run Anything, Fast: Video, speech, code.

Tenstorrent Galaxy™ runs any workload, both training and inference. It leads the industry in video generation, decode, and prefill benchmarks, and TT-Forge, Tenstorrent's open-source MLIR-based compiler, supports more models than any competitor. Video, language, code — pick your model, run it fast.

Buy Tenstorrent Galaxy™ Today

Contact us for custom configurations and pricing

Tenstorrent Galaxy™ Blackhole

Tenstorrent Galaxy™ Blackhole

Starting at $110,000

Run anything with our scalable, ultra-dense AI server.

Blackhole Supercluster

Blackhole Supercluster

Starting at $440,000

A four Tenstorrent Galaxy Blackhole supercluster that scales up for fast and affordable AI solutions.

Tenstorrent Galaxy™ Wormhole

Tenstorrent Galaxy™ Wormhole

Starting at $70,000

Our Tenstorrent Galaxy server built with our previous generation chip technology, Wormhole. Still scalable, ultra-dense, performant.

Deploy Solutions at Scale

Tenstorrent Galaxy Blackhole also comes in a supercluster of four, extending into multi-server clusters that scale. Configurations from 4 to 36 or more Tenstorrent Galaxy systems are optimized for workloads including AI video generation, large-scale LLM inference, and private AI infrastructure.

10x Faster Real-Time High-Quality Video

10x Faster Real-Time High-Quality Video

Run state-of-the-art video models and generate high quality videos faster on Tenstorrent Galaxy superclusters. Generate 720p, 81-frame video in 2.4 seconds.

Deploy AI Video Gen
Fastest Large-Context LLM Inference

Fastest Large-Context LLM Inference

Tenstorrent Galaxy is optimized for premium, latency-sensitive AI workloads. Run superclusters for high margin AI use cases including agentic workflows, real-time systems, and long context reasoning. Utilize the same general-purpose AI Tenstorrent systems for decode and prefill.

Explore LLM inference

Tenstorrent Galaxy™ Specs

Accelerator Compute, Memory, and Connectivity

Accelerators

32× Blackhole® ASICs 

Performance

23 PFLOPS Block FP8

Accelerator SRAM

6.2 GB @ 2.9 PB/s 

Accelerator DRAM

1 TB GDDR6 @ 16 TB/s

Accelerator Fabric

10× 400 GbE links per ASIC for 32 TB/s

Cluster Scale-out

Up to 56× 800 GbE QSFP‑DD ports for 11.2 TB/s

Host Compute, Memory, and Connectivity

Host CPU

1× AMD EPYC 9004 (Zen 4), up to 32 cores, ≤280 W TDP

Host Memory

Up to 576 GB (6× 96 GB) DDR5-4800 ECC RDIMM (6 slots, 0 free)

Networking

1× OCP NIC 3.0 PCIe Gen5 x16 SFF (2× 200 GbE default configuration)

Management Network

1× Dedicated RJ45 1 GbE with baseboard management controller (BMC)

Storage OS

2× 960 GB M.2 2280 PCIe Gen4 x4 NVMe SSD

Storage Internal

Up to 4× E1.S PCIe Gen5 x4 NVMe SSD (9.5/15 mm)

Software

Ubuntu 22.04

Deployment & Operations

Form Factor

6U rackmount, air‑cooled chassis

System Dimensions

Height: 17.6 in (446.8 mm), Width: 10.4 in (263.4 mm), Length: 34.8 in (884.5 mm)

System Weight

262 lbs (119 kg)

System Power Usage

8 – 10 kW avg, 12 kW max (Max system power configurable up to 14.5 kW)

Operating Temperature

50 – 95 °F (10 – 35 °C)

Pricing

$110,000 list

More Models, Deploy Fast

90% of models from HuggingFace just work and coverage is growing every day across LLMs, Image Gen, Speech, Vision, Embeddings, Encoders and more. Our hardware supports rapid model bring-up, enabling customers to deploy production AI systems.

Simple Scale

The underlying Tensix Neo™ architecture is designed to scale from one chip to thousands under one programming model. It’s the same mesh of cores communicating the same way, whether they’re on the same die or across a rack connected by Ethernet. Scale to fit your needs, big or small.

Yours, End to End

Tenstorrent's full software stack, compiler to kernel, is open source. Compile a model and run it out of the box, or go deeper and tune the kernels directly. No black boxes at any layer.