Import AI 448: AI R&D; Bytedance's CUDA-writing agent; on-device satellite AI

If Ukraine is the first major drone war, when will there be the first major AI war?

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

AI progress is moving faster than even well regarded forecasters can guess:…Ajeya Cotra updates her timelines…“On Jan 14th, I made predictions about AI progress in 2026. My forecasts for software engineering capabilities already feel much too conservative,” writes Ajeya Cotra in a blog. Ajeya is a longtime AI thinker who has done some great work trying to predict timelines to powerful AI. In this post, she explains that AI systems are moving faster than she thought, given the recent METR results putting Opus 4.6 as having a time horizon of 12 hours (Ajeya had predicted ~24 hours for the end of 2026 in January). “It’s no longer very plausible that after ten whole months of additional progress at the recent blistering pace,9 AI agents would still struggle half the time at 24 hour tasks,” Ajeya writes. “I’d guess that by the end of the year, AI agents will have a time horizon of over 100 hours on the sorts of software tasks in METR’s suite… And once you’re talking about multiple full-time-equivalent weeks of work, I wonder if the whole concept of “time horizon” starts to break down.”Why this matters - all the lights are flashing yellow for a software explosion: Posts like this as well as 70% of what I cover in this newsletter all point in the direction of AI systems getting extremely good, extremely quickly, and quickly colonizing and growing the economy. Read more: I underestimated AI capabilities (again) (Ajeya Cotra).***Want to measure AI R&D, here are 14 ways to do it:…Generating metrics about the most significant property of AI…The biggest thing that could ever happen with artificial intelligence will be when it starts to build itself. This phenomenon which has been often termed recursive self-improvement is often seen by many as an event horizon, beyond which it’ll be increasingly hard to reason about the future. How would we know if we were approaching this point? Researchers with GovAI and the University of Oxford have written a paper laying out 14 distinct metrics which could be measured to help us figure out the extent to which AI companies are succeeding in building and overseeing AI R&D Automation (AIRDA) - getting AI to build AI, a necessary prerequisite for recursive self-improvement.Why care about this: “AIRDA could accelerate AI progress, bringing forward AI’s benefits but also hastening the arrival of destructive capabilities, including those related to weapons of mass destruction, or other forms of disruption such as unemployment,” they write.

What are the 14 metrics?

Measure AI performance on AI R&D

Measure AI performance on AI R&D

Measure AI performance on AI R&D relative to humans and human-AI teams

Measure AI performance on AI R&D relative to humans and human-AI teams

Measure ‘oversight red teaming’ - how well human teams can effectively supervise AI systems that are building themselves

Measure ‘oversight red teaming’ - how well human teams can effectively supervise AI systems that are building themselves

Measure misalignment in AIRDA

Measure misalignment in AIRDA

Compute the rate of efficiency improvements on AI R&D tasks

Compute the rate of efficiency improvements on AI R&D tasks

Survey staff on how they use AI and what this means for productivity

Survey staff on how they use AI and what this means for productivity

Find out if and how often AI is used in high-stakes decisions

Find out if and how often AI is used in high-stakes decisions

Examine where AI researchers spend their time

Examine where AI researchers spend their time

Meta-measure the effectiveness of how well companies can oversee AI development (e.g, the rate of bugs or undesired behaviors that make it through to production even with human oversight)

Meta-measure the effectiveness of how well companies can oversee AI development (e.g, the rate of bugs or undesired behaviors that make it through to production even with human oversight)

Examine how often AI systems subvert the goals of their human developers

Examine how often AI systems subvert the goals of their human developers

Track the headcount of AI researchers at labs, as well as details of their performance

Leer artículo completo en substack.com

// artículos relacionados

Twitter/X: @lukOlejnik Anthropic got 90 minutes, openai didn't. regulation isn't a moat, it's a speed bump f…

Twitter/X15 jun

Twitter/X: @Bitcoin_Teddy There was an analysis of Anthropic employees and they have near zero entry-level s…

Twitter/X15 jun

Twitter/X: @charliebcurran this video about Anthropic explaining the best 😂

Twitter/X15 jun