Who's Winning the AI Research Race, and How — Mapping 2025 Through arXiv and NeurIPS Data

In This Post

Volume Explosion and "Dilution" — The 2025 Macro Story
Full Corpus vs. High-Impact: Why the Rankings Flip
NVIDIA's Reversal and the Rise of Chinese Startups
Universities, Companies, Startups: How Each Sector Competes
Takeaways: A New Playbook for Tracking AI Research

Volume Explosion and "Dilution"
— The 2025 Macro Story

In 2025, the total author-contribution score for arXiv papers grew +14.7% year over year. At the same moment, the over_70 score — which counts only high-attention papers (score ≥ 70) — fell −3.0%. These two numbers tell one story: volume growth is outrunning impact growth.

"More papers" does not mean "more influential papers." 2025 was the year that submission volume clearly outpaced the growth of globally noticed work.

over_70 share: 18.6% (2024) → 15.7% (2025), a drop of 2.9 percentage points

NeurIPS grew +17.6%, reflecting a dramatic expansion of the conference itself. Reading the three metrics together, the AI research ecosystem is scaling rapidly — but producing globally noticed, high-impact work within that larger pool is getting harder.

How to Read the Three Metrics

arXiv full corpus is a leading indicator of research activity. over_70 (high-attention score) measures published output quality and acts as a leading indicator for conference acceptance. NeurIPS is a lagging indicator — peer-reviewed recognition that follows arXiv by 6–18 months. Together they reveal institutions that publish a lot but attract little attention, or that perform well at conferences but share less openly.

Full Corpus vs. High-Impact:
Why the Rankings Flip

The 2025 arXiv full-corpus leader is Tsinghua University (43,701 pts), followed by Shanghai Jiao Tong, Zhejiang, Peking, and the Chinese Academy of Sciences. The top five are entirely Chinese universities and research institutes.

Switch the lens to high-impact papers only, and the picture changes completely.

Full Rank	Institution	Full Score		over_70 Rank	Institution	over_70 Score
1	Tsinghua University Univ	43,701	→	2	Tsinghua University Univ	7,881
2	Shanghai Jiao Tong Univ. Univ	37,718	→	3	SJTU Univ	7,271
3	Zhejiang University Univ	29,317	→	—	Outside top 10	—
4	Peking University Univ	27,477	→	—	Outside top 10	—
5	Chinese Academy of Sciences Lab	27,231	→	—	Outside top 10	—
—	—	—	→	1	Microsoft Corp	7,882
—	—	—	→	4	Shanghai AI Lab Lab	7,235
—	—	—	→	5	NVIDIA Corp	7,120
7	Google Corp	24,185	→	6	Google Corp	6,916

Zhejiang, Peking, and CAS are 3rd through 5th in raw volume but outside the high-impact top 10. Microsoft ranks 12th by volume but leads the world by high-impact score. Universities win on breadth; companies and labs win on conversion rate — turning research activity into globally noticed results.

NVIDIA's Reversal and the Rise
of Chinese Startups

In the change from 2024 to 2025, NVIDIA posted the largest gain in high-impact score — by a wide margin.

Largest Full-Corpus Gain (2024→2025)

ByteDance

+7,510

7,001 → 14,511. Dominant volume growth. However, high-impact gain was +1,710 (3rd place) — a very different story.

Largest High-Impact Gain (2024→2025)

NVIDIA

+4,172

2,949 → 7,120. Hybrid Mamba-Transformer, 1M-context, and inference efficiency pushed NVIDIA into the elite tier of high-impact publishers.

The top volume grower and the top quality grower are different institutions. This matters practically: your priority list for competitive tracking will look entirely different depending on which signal you lead with.

A second trend deserves equal attention: institutions with near-zero presence in 2024 that emerged dramatically in 2025 through high-impact arXiv output.

Generalist frontier

StepFun

1,183 pts

2024: near zero → 2025: 1,183. Simultaneous push across LLM, efficiency, and video generation. The startup most comparable to incumbent leaders.

Reasoning / RL specialist

DeepSeek-AI

367 pts

All-in on RL, attention, and reasoning. LLM 53% + RL 37% — a portfolio of just three themes with maximum concentration.

Diffusion / media specialist

HiDream.ai

446 pts

GenAI / Video 63% — the most extreme specialization of any major institution tracked. Image editing and video generation infrastructure.

Large-scale MoE / systems

Inclusion AI

250 pts

No NeurIPS presence yet, but already visible on arXiv. Trillion-parameter-scale MoE and system co-design are the signature themes.

Waiting for NeurIPS would have meant missing all of these. arXiv over_70 signals emerged months before any conference reflection.

Universities, Companies, Startups:
How Each Sector Competes

Companies: LLM as the Core, Differentiated at the Edges

Microsoft, NVIDIA, Google, Alibaba, and Tencent all allocate 30–36% of their high-impact portfolio to LLM/Foundation Models. The strategic divergence lies in what surrounds that core. Microsoft stacks Efficiency (15%) and Evaluation (21%) into a practical model-family triad. NVIDIA layers Efficiency (14%) and Scaling (18%) to push inference infrastructure. Alibaba bets heavily on RL/Agents (18%) to extend Qwen toward agentic workflows.

Universities: Evaluation and Benchmarking as the Universal Pillar

Looking across the top six universities — Tsinghua, SJTU, NUS, UC Berkeley, Stanford, and Fudan — one pattern holds universally: Eval/Benchmark is the largest theme at every institution, ranging from 31% to 44%. The university sector is providing the evaluation infrastructure the whole ecosystem depends on. Differences emerge at the edges. Berkeley runs the highest RL/Agents share (11%) and Scaling (8%), anchoring embodied AI and sim-to-real transfer. Stanford leads on evaluation design and Preference Optimization. Fudan's Eval at 44% is the highest specialization of any university in the set.

Startups: The Strategy Is Already Visible — You Just Have to Look

The research portfolios of leading AI startups are far from monolithic. Two clear poles have emerged.

Generalist type: StepFun deploys simultaneously across LLM, Efficiency, and GenAI/Video — the broadest portfolio of any startup, closest to established incumbents. Kimi/Moonshot emphasizes MoE, long-context, and agentic benchmarking, with a notably high Efficiency share (25%) pointing toward practical deployment focus.

Ultra-specialist type: DeepSeek's entire high-impact output is split across just three themes — LLM, Eval, and RL — with the highest concentration on reasoning of any institution tracked. HiDream commits 63% of its portfolio to GenAI/Video, a level of specialization with no close parallel.

Key Observation

The strategic diversity among startups is invisible in NeurIPS results. The arXiv over_70 category distribution reveals each firm's research bets months in advance — making it the more actionable signal for competitive intelligence.

Takeaways: A New Playbook for
Tracking AI Research

Combining three data sources reveals a competitive picture that no single metric can show.

1 — Track Portfolios, Not Headcounts

Raw submission volume shows the scale of activity, not strategic intent. The over_70 category distribution reveals where an institution is genuinely staking its claims at the frontier. The same rank can mean very different things depending on where the score comes from.

2 — Sector Role Determines the Right Comparison

Companies (frontier models, efficiency), universities (evaluation, theory, broad exploration), public labs (open benchmark hubs), and startups (sharp specialization) compete in structurally different ways. Ranking them on the same axis without accounting for role differences produces misleading conclusions.

3 — Don't Wait for the Conference to Find New Players

StepFun, Inclusion AI, DeepSeek — all of these registered strong arXiv over_70 signals before any meaningful NeurIPS presence. The peer-review cycle introduces a structural lag of 6–18 months. Using over_70 as a leading indicator closes that gap.

4 — "Strong at NeurIPS" ≠ "Open Research Culture"

The gap between conference acceptance counts and arXiv high-impact publication reveals differences in disclosure posture. NVIDIA and Microsoft publish far more high-impact work relative to their NeurIPS footprint than many university counterparts. Treating conference counts as a proxy for openness without checking arXiv leads to systematically wrong conclusions.

The 2025 AI research landscape is entering a new phase — one where volume alone is no longer a useful signal, portfolio composition is the real competitive differentiator, and the fastest-moving actors announce their bets on arXiv months before any conference confirms them. The map is already there. You just need to know which layer to look at.

Data and Methodology
This analysis is based on arXiv author-contribution score aggregations (year_affi_score / year_affi_score_over_70 / year_affi_category_score_over_70) and NeurIPS 2024–2025 institutional acceptance counts. count(arxiv_id) reflects author contribution instances, not unique paper counts. Categories are multi-label; category-level scores therefore exceed institutional totals.

Volume Explosion and "Dilution"— The 2025 Macro Story