Maligned - December 12, 2025

AI news without the BS

Here’s what actually matters in AI today. No fluff, no hype - just 5 developments worth your time.

Today’s Top 5 AI Developments

1. Robots Get Socially Smarter with Language 🤖🗣️

Robots are learning to navigate human spaces better, not just avoiding obstacles but understanding social cues and language instructions. A new framework, LISN, uses vision-language models to help robots follow complex commands like “follow that person in the crowd” while also respecting social norms. This is a big step towards practical, human-friendly robot deployment, moving beyond basic path planning to genuine social awareness.

Source: arXiv Link: https://arxiv.org/abs/2512.09920v1

2. LLMs are Getting Leaner for Action ⚡️

You know the drill: LLMs are powerful but slow and expensive for complex planning. SCOPE tackles this by using an LLM as a “one-time teacher” to pretrain a much lighter model for hierarchical planning in text environments. It slashes inference time from minutes to seconds, making LLM-guided agents significantly more efficient without sacrificing performance. Finally, practical LLM agents might be within reach.

Source: arXiv Link: https://arxiv.org/abs/2512.09897v1

3. Measuring If VLMs Truly “Act” Like Humans 🧠👁️

Vision-Language Models (VLMs) can see and describe, but can they truly reason and act proactively like humans without constant prompting? VisualActBench, a new benchmark, tests exactly this, revealing that even top-tier models like GPT-4o still have a significant gap in generating proactive, high-priority actions compared to human decision-making. It’s a crucial reality check on VLM capabilities for real-world agent deployment.

Source: arXiv Link: https://arxiv.org/abs/2512.09907v1

4. Video Editing That Understands Physics, Not Just Pixels 🎥✨

Current video editing AI can swap faces or change styles, but struggles with physical plausibility. ReViSE introduces a new task: Reason-Informed Video Editing, where models must understand causal dynamics and physics. Their self-reflective framework uses an internal VLM to give feedback, significantly improving editing accuracy and visual fidelity, making your AI-generated videos look a lot less “fake.”

Source: arXiv Link: https://arxiv.org/abs/2512.09924v1

5. Better 3D Models from Barely Any Photos 📸🧊

Creating high-quality 3D models usually requires a ton of photos, which is often impractical. GAINS (Gaussian-based Inverse rendering from Sparse multi-view captures) is changing that, showing how to achieve impressive material recovery and novel-view synthesis from very limited image sets. This breakthrough in 3D vision makes it much easier to build accurate digital twins and virtual assets with minimal data.

Source: arXiv Link: https://arxiv.org/abs/2512.09925v1

That’s it for today. Stay aligned. 🎯

Maligned - AI news without the BS