Inception: Unlocking faster and cheaper LLMs with diffusion

Happy Monday.

Heading to Boston or NYC at the end of this month? We have eight events we’re hosting - be sure to register if you’ll be in town.

Tues. May 26 - Founding GPs VC Omakase (Boston)
Wed. May 27 - AI Champions Dinner (Boston)
Tues. June 2 - Founding GPs VC Omakase (New York)
Wed. June 3 - AI Day (New York)
Tues. June 23 - AI Champions Dinner (San Francisco)

Arek and Ethan 🦄

When you ask a traditional LLM like ChatGPT to write something, it builds sentences one word at a time using an approach called autoregression. The model represents each piece of a word as a token and generates additional tokens for each subsequent piece of the sentence it predicts. Meanwhile, image and video generation models rely on a process called diffusion, which creates an output by predicting the final outcome from a noisy input. Inception fuses these two approaches. Its flagship model, Mercury, generates text by starting from a noisy representation of the desired output and iteratively denoising it, producing the complete response in parallel.

An example diffusion process. (Structures Blog/Peter Sorrenson)

By generating multiple tokens at once, this method reduces latency and computational cost compared with token‑by‑token generation. The platform also supports multimodal integration, allowing language to be combined with audio, images, and video. It is currently deployed for enterprise customers.

Check it out: inceptionlabs.ai

Your competitors just shipped something. Probably with Bolt.new.

Founders are using prompt coding to launch products in hours, not sprints. Describe your app, Bolt.new builds it. Landing pages, dashboards, and full-stack tools.

No hiring. No handoffs. No six-week timelines.

The founders moving fastest right now aren't writing code. They're writing prompts.

Start building at bolt.new.

This is sponsored content.

Inception’s business model centers on enterprise API sales. Its core offering is an API product tailored for enterprises running LLMs in production, targeting application-layer companies building AI features. Key use cases include applications with strict latency requirements, such as voice agents, search and retrieval workflows (query writing, ranking, and summarization), and autocomplete and edit suggestions for coding tools.

Raised $50 million seed round in November 2025, led by Menlo Ventures with participation from Mayfield, Innovation Endeavors, Microsoft’s M12 fund, and Nvidia’s venture
Deployed in several Fortune 500 companies including Quora, Microsoft, and AWS

CEO Stefano Ermon’s lab at Stanford has been pioneering the research and development of diffusion models since 2019. Motivated by the question of why diffusion models worked so well for images and video but not yet for text or code, he and his team achieved a breakthrough in 2024 with discrete diffusion models for language, showing comparable performance to autoregressive models but 5 to 10x faster. This finding resulted in “Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution”, a paper that won a Best Paper Award at the International Conference on Machine Learning in 2024.

Ermon began exploring whether this approach could scale to production-level language models. He then reached out to investors and, together with two former students, UCLA professor Aditya Grover and Cornell Tech professor Volodymyr Kuleshov, founded Inception to commercialize diffusion-based LLMs.

Inception Lab’s team is undoubtedly exceptional. It’s hard to imagine someone better suited to start a diffusion-based LLM company than one of the creators of the diffusion model himself. But as Mercury 2 rolls out, two questions remain: do customers care enough to pay, and can diffusion-based models become a durable business model?

On the demand side, Mercury’s main advantage is speed. Inception claims Mercury 2 can reach 1,009 tokens per second on NVIDIA Blackwell GPUs, with pricing set at $0.25 per million input tokens and $0.75 per million output tokens. Launch coverage reports Mercury 2 achieves around 1,000 tokens per second, compared to just 89 tokens per second for Claude 4.5 Haiku Reasoning and 71 tokens per second for GPT-5 Mini, making Mercury 2 roughly 11 to 14 times faster in these comparisons. Independent benchmarks show slightly lower, but still competitive numbers, with Mercury 2 at about 873 tokens per second on leaderboards and 672 tokens per second in other tests depending on hardware and setup.

Speed alone wouldn't matter if quality suffered, but it doesn't. Mercury 2 posts 91.1 on AIME 2025 (competitive math), 73.6 on GPQA (graduate-level science), and 71.3 on IFBench (instruction following), which is in the same tier as Haiku and GPT-5 Mini.

Diffusion-based AI models optimized for coding, real-time voice, and instant agents.

That combination is what makes the value proposition real for latency-sensitive products, and early traction supports it. Mercury is live on Azure AI Foundry and is powering Microsoft's NLWeb. Thus, the path to market victory for Inception hinges on selling to customers who value speed at Haiku-tier quality over the intelligence gains of frontier models like Opus 4.7, and on competing primarily with speed-optimized peers like Haiku rather than frontier reasoners.

Durability is the tougher challenge for Inception. Diffusion offers architectural benefits including parallel generation, built-in error correction, and strong support for multimodal tasks, and the founding team is world-class experts in this field. While major players like OpenAI, Anthropic, and Google (which has already published its own research on diffusion language models) will quickly adopt diffusion if it proves successful, Inception currently leads.

The growing appetite for low-latency inference is apparent across the industry, as other companies are tackling latency from different angles, such as Groq and Cerebras, which are optimizing hardware for autoregressive models.

Whether diffusion becomes a lasting category and a true alternative to autoregressive models, or simply a feature absorbed by incumbents, remains to be seen. Recently, only a few dominant players have shaped the AI landscape, but investors are now betting that Inception could join their ranks. If Inception can demonstrate market viability and drive adoption of its diffusion LLM, it could become a household name.

ICYMI: We recently announced our official investor syndicate. Check out the announcement post for more details on how to invest in our deals.

Is this week's company a future unicorn?

Login or Subscribe to participate

Inception

Is this week's company a future unicorn?

Keep Reading

Unicorner

Home