Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research
Listen to episode
About this episode
Andreas Stuhlmüller and Jungwon Byun return to discuss how Elicit is building trusted reasoning workflows for scientific research as frontier models grow more powerful but less transparent. They explain process supervision, domain-specific reasoning primitives, and world models that make evidence, causality, and counterfactuals more inspectable. The conversation also covers life sciences use cases, evaluating conflicting evidence, automated software engineering at Elicit, token costs, Gemini, and why legible reasoning may still beat neuralese.
Mercury: Command is Mercury’s new conversational interface, giving you natural-language access to your finances and helping you take actions within your existing permissions and approval policies. Visit https://mercury.com to learn more and apply online in minutes.
LINKS:
- Elicit Research Platform
- Andreas Stuhlmüller Personal Site
- Jungwon Byun X Profile
- Ought Research Organization
- Elicit Founders Previous Episode
- GPT-4 Technical Report
- Monitoring Reasoning Models Paper
- Ought ICE GitHub Repository
- Hard-to-Verify Tasks Essay
- Karpathy LLM Wiki Gist
- Obsidian Knowledge Base App
- Mixpanel Analytics Platform
- Amplitude Analytics Platform
- Anthropic Tracing Thoughts Research
- Claude AI Chat Assistant
- METR Long Tasks Measurement
- Pi Agent Scaffold Repository
- Personal AI Infrastructure Repository
- Elicit Claude Opus Evaluation
- Elicit API Documentation
- <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced
Want to find AI jobs?
Join thousands of AI professionals finding their next opportunity