XOR Labs

is developing a principled understanding of how agentic behaviour emerges in AI systems, what capabilities and risks result from this, and how these risks can be mitigated. We study how agency emerges in both individual AI systems and systems of interacting agents.

Publications

AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability Rosas, Boyd, Baltieri. Reinforcement Learning Journal, 2025.

From monoliths to modules: Decomposing transducers for efficient world modelling Boyd, Nowak, Hyland, Baltieri, Rosas. arXiv:2512.02193, 2025.

Symmetries at the origin of hierarchical emergence Rosas. arXiv:2512.00984, 2025.

Toward a unified taxonomy of information dynamics via integrated information decomposition Mediano, Rosas, et al. PNAS 122(39), 2025.

Software in the natural world: A computational approach to hierarchical emergence Rosas, Geiger, Luppi, Seth, Polani, Gastpar, Mediano. arXiv:2402.09090, 2024.

Researchers