Volume Explosion and "Dilution"
— The 2025 Macro Story
In 2025, the total author-contribution score for arXiv papers grew +14.7% year over year. At the same moment, the over_70 score — which counts only high-attention papers (score ≥ 70) — fell −3.0%. These two numbers tell one story: volume growth is outrunning impact growth.
"More papers" does not mean "more influential papers." 2025 was the year that submission volume clearly outpaced the growth of globally noticed work.
NeurIPS grew +17.6%, reflecting a dramatic expansion of the conference itself. Reading the three metrics together, the AI research ecosystem is scaling rapidly — but producing globally noticed, high-impact work within that larger pool is getting harder.
How to Read the Three Metrics
arXiv full corpus is a leading indicator of research activity. over_70 (high-attention score) measures published output quality and acts as a leading indicator for conference acceptance. NeurIPS is a lagging indicator — peer-reviewed recognition that follows arXiv by 6–18 months. Together they reveal institutions that publish a lot but attract little attention, or that perform well at conferences but share less openly.
Full Corpus vs. High-Impact:
Why the Rankings Flip
The 2025 arXiv full-corpus leader is Tsinghua University (43,701 pts), followed by Shanghai Jiao Tong, Zhejiang, Peking, and the Chinese Academy of Sciences. The top five are entirely Chinese universities and research institutes.
Switch the lens to high-impact papers only, and the picture changes completely.
| Full Rank | Institution | Full Score | over_70 Rank | Institution | over_70 Score | |
|---|---|---|---|---|---|---|
| 1 | Tsinghua University Univ | 43,701 | → | 2 | Tsinghua University Univ | 7,881 |
| 2 | Shanghai Jiao Tong Univ. Univ | 37,718 | → | 3 | SJTU Univ | 7,271 |
| 3 | Zhejiang University Univ | 29,317 | → | — | Outside top 10 | — |
| 4 | Peking University Univ | 27,477 | → | — | Outside top 10 | — |
| 5 | Chinese Academy of Sciences Lab | 27,231 | → | — | Outside top 10 | — |
| — | — | — | → | 1 | Microsoft Corp | 7,882 |
| — | — | — | → | 4 | Shanghai AI Lab Lab | 7,235 |
| — | — | — | → | 5 | NVIDIA Corp | 7,120 |
| 7 | Google Corp | 24,185 | → | 6 | Google Corp | 6,916 |
Zhejiang, Peking, and CAS are 3rd through 5th in raw volume but outside the high-impact top 10. Microsoft ranks 12th by volume but leads the world by high-impact score. Universities win on breadth; companies and labs win on conversion rate — turning research activity into globally noticed results.
NVIDIA's Reversal and the Rise
of Chinese Startups
In the change from 2024 to 2025, NVIDIA posted the largest gain in high-impact score — by a wide margin.
7,001 → 14,511. Dominant volume growth. However, high-impact gain was +1,710 (3rd place) — a very different story.
2,949 → 7,120. Hybrid Mamba-Transformer, 1M-context, and inference efficiency pushed NVIDIA into the elite tier of high-impact publishers.
The top volume grower and the top quality grower are different institutions. This matters practically: your priority list for competitive tracking will look entirely different depending on which signal you lead with.
A second trend deserves equal attention: institutions with near-zero presence in 2024 that emerged dramatically in 2025 through high-impact arXiv output.
2024: near zero → 2025: 1,183. Simultaneous push across LLM, efficiency, and video generation. The startup most comparable to incumbent leaders.
All-in on RL, attention, and reasoning. LLM 53% + RL 37% — a portfolio of just three themes with maximum concentration.
GenAI / Video 63% — the most extreme specialization of any major institution tracked. Image editing and video generation infrastructure.
No NeurIPS presence yet, but already visible on arXiv. Trillion-parameter-scale MoE and system co-design are the signature themes.
Waiting for NeurIPS would have meant missing all of these. arXiv over_70 signals emerged months before any conference reflection.
Universities, Companies, Startups:
How Each Sector Competes
Companies: LLM as the Core, Differentiated at the Edges
Microsoft, NVIDIA, Google, Alibaba, and Tencent all allocate 30–36% of their high-impact portfolio to LLM/Foundation Models. The strategic divergence lies in what surrounds that core. Microsoft stacks Efficiency (15%) and Evaluation (21%) into a practical model-family triad. NVIDIA layers Efficiency (14%) and Scaling (18%) to push inference infrastructure. Alibaba bets heavily on RL/Agents (18%) to extend Qwen toward agentic workflows.
Universities: Evaluation and Benchmarking as the Universal Pillar
Looking across the top six universities — Tsinghua, SJTU, NUS, UC Berkeley, Stanford, and Fudan — one pattern holds universally: Eval/Benchmark is the largest theme at every institution, ranging from 31% to 44%. The university sector is providing the evaluation infrastructure the whole ecosystem depends on. Differences emerge at the edges. Berkeley runs the highest RL/Agents share (11%) and Scaling (8%), anchoring embodied AI and sim-to-real transfer. Stanford leads on evaluation design and Preference Optimization. Fudan's Eval at 44% is the highest specialization of any university in the set.
Startups: The Strategy Is Already Visible — You Just Have to Look
The research portfolios of leading AI startups are far from monolithic. Two clear poles have emerged.
Generalist type: StepFun deploys simultaneously across LLM, Efficiency, and GenAI/Video — the broadest portfolio of any startup, closest to established incumbents. Kimi/Moonshot emphasizes MoE, long-context, and agentic benchmarking, with a notably high Efficiency share (25%) pointing toward practical deployment focus.
Ultra-specialist type: DeepSeek's entire high-impact output is split across just three themes — LLM, Eval, and RL — with the highest concentration on reasoning of any institution tracked. HiDream commits 63% of its portfolio to GenAI/Video, a level of specialization with no close parallel.
Key Observation
The strategic diversity among startups is invisible in NeurIPS results. The arXiv over_70 category distribution reveals each firm's research bets months in advance — making it the more actionable signal for competitive intelligence.
Takeaways: A New Playbook for
Tracking AI Research
Combining three data sources reveals a competitive picture that no single metric can show.
1 — Track Portfolios, Not Headcounts
Raw submission volume shows the scale of activity, not strategic intent. The over_70 category distribution reveals where an institution is genuinely staking its claims at the frontier. The same rank can mean very different things depending on where the score comes from.
2 — Sector Role Determines the Right Comparison
Companies (frontier models, efficiency), universities (evaluation, theory, broad exploration), public labs (open benchmark hubs), and startups (sharp specialization) compete in structurally different ways. Ranking them on the same axis without accounting for role differences produces misleading conclusions.
3 — Don't Wait for the Conference to Find New Players
StepFun, Inclusion AI, DeepSeek — all of these registered strong arXiv over_70 signals before any meaningful NeurIPS presence. The peer-review cycle introduces a structural lag of 6–18 months. Using over_70 as a leading indicator closes that gap.
4 — "Strong at NeurIPS" ≠ "Open Research Culture"
The gap between conference acceptance counts and arXiv high-impact publication reveals differences in disclosure posture. NVIDIA and Microsoft publish far more high-impact work relative to their NeurIPS footprint than many university counterparts. Treating conference counts as a proxy for openness without checking arXiv leads to systematically wrong conclusions.
The 2025 AI research landscape is entering a new phase — one where volume alone is no longer a useful signal, portfolio composition is the real competitive differentiator, and the fastest-moving actors announce their bets on arXiv months before any conference confirms them. The map is already there. You just need to know which layer to look at.