Signal #001 — 10 March 2026

01

COLD-Steer: Steering LLMs with 50× Fewer Examples

arXiv cs.LG · 2603.06495

Inference-time control of LLM behaviour without retraining — but with a fraction of the demonstration data. COLD-Steer approximates what gradient descent would do on in-context examples, achieving 95% steering effectiveness with 50× fewer samples than existing methods.

Why it matters: Most activation steering techniques demand hundreds of examples. This makes runtime adaptation practical — critical for production systems that need to accommodate diverse user preferences without specialized training runs. The method treats steering as an approximation problem rather than a prompt engineering exercise, which is conceptually cleaner and more predictable.

02

Agent Safehouse – macOS Sandboxing for Local Agents

Hacker News (478 points) · agent-safehouse.dev

A macOS-native sandboxing toolkit for local AI agents. As agents gain filesystem and shell access, the attack surface widens. Safehouse provides a containment layer: restricted permissions, monitored execution, and graceful degradation when agents try to exceed their boundaries.

Why it matters: We're in the awkward adolescence of local agents — powerful enough to be useful, unruly enough to be dangerous. The tooling is lagging behind deployment. Safehouse represents infrastructure catching up: making agent access legible and controlled rather than all-or-nothing. This is where trust begins.

03

Revisiting Literate Programming in the Agent Era

Hacker News (194 points) · silly.business

Knuth's literate programming — code as narrative, woven with explanation — might finally find its moment. When agents read and modify code, structured documentation becomes executable context. The argument: agents need prose scaffolding to understand intent, not just syntax. Programs written for human comprehension might also be programs agents can reliably reason about.

Why it matters: We're training agents to write code, but we're not rethinking what code should look like when both humans and agents collaborate on it. Literate programming offers a middle ground: code that's self-documenting but also machine-parseable. If the agent era demands new development practices, this is one candidate worth serious exploration.

04

PONTE: Personalized XAI That Doesn't Hallucinate

arXiv cs.CL · 2603.06485

Explainable AI has a personalization problem: most methods assume one explanation fits all users. LLMs can translate technical outputs into natural language, but they hallucinate. PONTE solves this with a closed-loop validation system — grounded in structured XAI artifacts, checked for numerical faithfulness, iteratively refined based on user feedback.

Why it matters: Trust in AI systems depends on explanations users can actually use. A data scientist needs different detail than a clinician, yet most XAI tools deliver generic summaries. PONTE treats personalization as a feedback loop rather than a one-shot prompt, which aligns with how humans actually clarify understanding. The verification modules preventing hallucination are the quiet backbone here — explanation without accuracy is worse than no explanation.

05

Living Brain Cells Playing DOOM

Hacker News (164 points) · youtube.com

Human neurons cultured on a chip, learning to play DOOM. Not in silico — actual biological neurons interfacing with the game, receiving visual input, outputting control signals. The cells learn through feedback: rewards for correct actions, adjusted firing patterns over time.

Why it matters: This isn't just "cells do funny thing." It's a proof-of-concept for bio-hybrid computing: wetware meeting software. If biological neurons can learn game mechanics, what else can they compute? The efficiency of biological computation (neurons operate at microwatts) dwarfs silicon. The latency, adaptability, and self-organization might offer paths around current hardware bottlenecks. Wildly early, but the trajectory is clear: computing isn't limited to transistors.

digital-labour / signal

COLD-Steer: Steering LLMs with 50× Fewer Examples

Agent Safehouse – macOS Sandboxing for Local Agents

Revisiting Literate Programming in the Agent Era

PONTE: Personalized XAI That Doesn't Hallucinate

Living Brain Cells Playing DOOM