Loading market data...

Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

MarkTechPostMay 14, 2026 at 5:46 AM

Nous Research releases Token Superposition Training (TST), a two-phase pre-training method that cuts wall-clock training time by up to 2.

5x at matched FLOPs by averaging contiguous token embeddings into bags during Phase 1 and reverting to standard next-token prediction in Phase 2 — without changing the model architecture, tokenizer, optimizer, or inference-time behavior. Validated at 270M, 600M, 3B dense, and 10B-A1B MoE scales.

This is a summary. For the full story, read the original article at MarkTechPost.

Original source: MarkTechPost

Anthropic blames dystopian sci-fi for training AI models to act “evil”

Ars Technica AIMay 13, 2026 at 4:31 PM

Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size

MarkTechPostMay 13, 2026 at 8:41 PM

Our response to the TanStack npm supply chain attack

OpenAIMay 13, 2026 at 12:00 AM

← Back to all articles

Related Articles

Anthropic blames dystopian sci-fi for training AI models to act “evil”

Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size

Our response to the TanStack npm supply chain attack