Llama-3.1 Announcement
We are happy to announce that we have brought up support for Llama-3.1-70B inference on Tenstorrent’s 8-chip systems, the TT-QuietBox and the TT-LoudBox.

The source code for Llama-3.1-70B and other models that are supported is on our GitHub. We have also merged support for Llama-3.1-8B, running on our single-chip n150 card.
Implementation highlights:
- Fractured with 8-way tensor parallelism
- Uses FlashAttention and FlashDecode
- Uses Mixed BF16, BFP8, and BFP4 precision
- Performance was measured in eager mode with tracing disabled
We are working on optimizations which will get us to our target of 20 tokens/second/user. Buy our 8-chip systems (TT-QuietBox and TT-LoudBox) to try Llama-3.1-70B at home on Tenstorrent hardware!
Other articles

Community Highlight: Tenstorrent Wormhole Series Part 3: NoC propagation delay
An in depth look at Tenstorrent Wormhole, originally posted on corsix.org

ECOBLOX Partners with Tenstorrent to Drive AI/HPC Data Center Growth in the Middle East/Africa Region
Dubai, UAE, March 11, 2025 – ECOBLOX, a pioneer in AI/HPC supercomputing system integration for design and construction of data centers, has announced a strategic partnership with Tenstorrent, a next-generation computing company that builds computers for AI, to support rapid growth in the Middle East and Africa region.

Tenstorrent and UnsungFields Announce Strategic Technology Alliance
Tokyo, Japan: UnsungFields announced today that it has entered into a strategic technology partnership with Tenstorrent, a leading AI semiconductor company, to establish a co-branded AI cloud platform and expand presence in the AI market.