Solutions

Fastest Large-Context LLM Inference

Tenstorrent Galaxy is optimized for premium, latency-sensitive AI workloads. Run superclusters for high margin AI use cases including agentic workflows, real-time systems, and long context reasoning. Utilize the same general-purpose AI Tenstorrent systems for decode and prefill.

Deploy LLM Inference

Half the Time to First Token, 4x Output Speed

Half the Time to First Token

In Blitz mode, optimized for speed, Tenstorrent Galaxy supercluster parallelizes prefill across servers, efficiently overlapping data placement and data flow, with high utilization compute.

Time to First Token (sec), DeepSeek V3.2, 100k Context

7.5 sec
4.0 sec

GPUs

Tenstorrent

4x Output Speed

Decode on Tenstorrent Galaxy superclusters intelligently leverages on-chip SRAM and DRAM pipelined across servers, enabling scale out for big models with largest context for agentic workloads.

Output Speed (token/sec), DeepSeek V3.2, 100k Context

86 token/sec
350 token/sec

GPUs

Tenstorrent

Benefits

Fast

Fast

Effective parallelization across a large number of chips enables us to deliver the fastest large-context LLMs.

Networked AI

Networked AI

Utilize the same hardware for prefill and decode. Our Networked AI architecture unifies compute, SRAM and DRAM memory, and networking for general-purpose AI.

Scalable

Scalable

Built to scale with supercluster configurations. GPU architectures are constrained by the box — Tenstorrent Galaxy scales past it.

Open

Open

No proprietary interconnects, switches, or HBM. Fully open-source end-to-end software stack. Deploy state-of-the-art models for your AI solutions.

Technologies

Blackhole Architecture for Production LLM Inference

Tenstorrent Galaxy superclusters

Tenstorrent Galaxy superclusters

Run anything – fast, affordable, simple. High-density, scalable compute. Add systems, add speed.

Explore Galaxy superclusters
Tensix Cores

Tensix Cores

Purpose-built for parallel, continuous workloads. With 91x SRAM/capacity per dollar and 12x SRAM bandwidth per dollar, Tensix delivers where it counts.

Explore
Model Support

Model Support

90% of models from HuggingFace just work and coverage is growing every day across LLMs, Image Gen, Speech, Vision, Embeddings, Encoders and more.

Explore

4 x Tenstorrent Galaxy™ Blackhole superclusters

Tenstorrent Galaxy™ Blackhole can be deployed in superclusters, extending into multi-server topologies that can scale-out to any size. Four Tenstorrent Galaxy™ superclusters lead the industry in performance and cost for large context LLM inference.

Deploy Today
4 x Tenstorrent Galaxy™ Blackhole superclusters