My New Project: Authorship AI and the Oversight Paradox
Announcing a grant to build tools that strengthen—rather than replace—our capacity to think.
The Problem I Can’t Unsee
AI makes it easier and faster than ever before to write. The problem is, today’s assistants often incentivize users to offload their intellectual and creative capacities—from original argument construction to stylistic choices.
As educators take to the op-ed pages with concerns about learning and knowledge production—and early empirical work suggests reduced neural connectivity when people rely on LLMs for writing—many frontier safety folks look away. “Education isn’t civilization-scale,” the reasoning goes; alignment, control, and security are the priority.
I think these are the same problem.
The foundation of most AI oversight mechanisms—reinforcement learning from human feedback (RLHF), red-teaming, interpretability audits, constitutional AI—is independent human judgment. Yet we’re rapidly normalizing an oversight apparatus that leans on AI to do the very thinking we need to keep sharp.
What if the greatest risk from AI isn’t that it becomes too powerful, but that we become too weak to govern it? Cognitive dependency and epistemic erosion are an under-studied pathway to civilization-scale risk.
Why I’m Building Authorship AI
I watched my writing-intensive career get automated almost overnight. My mode of output transformed from the long-form article/report/memo to the prompt. Since then I’ve heard the same signals everywhere:
Educators say students submit essays with words they can’t define.
Learning & Development leads—professionals who coach thinking—tell me they feel less analytical after months of AI-assisted writing.
Early studies hint at reasoning and expressive atrophy when people outsource drafting.
I’m not an AI doomer; I’m obsessed with this tech. But I believe assistants should augment cognitive autonomy, not replace it—and a humanist-designed product will yield higher-quality outputs.
Authorship AI is my response: a tool that adapts to you, preserves your voice, and nudges deeper reasoning instead of taking the pen away.
Grateful to Cosmos Institute and FIRE for backing this work. I’m also thrilled to have Andy Neuschatz (founding CTO at The Information) as technical advisor.
The Oversight Paradox
AI safety’s marquee methods assume humans can still do a few hard things:
Articulate principles (constitutional approaches).
Decompose tasks and spot failure modes (scalable oversight, debate).
Evaluate arguments adversarially (red-teaming, audits).
Translate values into preferences (reward modeling).
Now imagine six months of heavy AI assistance for the people running those processes. Will they still notice a subtle fallacy? Will they still construct a counter-argument rather than ask the model for one?
The paradox:
Control methods require strong human evaluators.
Using AI to do the evaluation degrades evaluative skill.
The more we rely on oversight, the less capable we become of providing it.
The safety question shifts from “Will the model lie?” to “Will we notice?”
(I’m working on a literature review and essay series on this topic — stay tuned.)
What We’re Building
Authorship AI (MVP in 90 days)
Voice preservation: Models trained on your corpus to spot drift and nudge you back to your style.
Socratic scaffolding: Guided prompts that ask you to outline claims, evidence, and counter-arguments before the model completes anything.
Browser extension: Delightful user interface that gives feedback on your prompting and your drafts.
MCP server connection: Highly contextualized outlining and writing guidance that requires regular writing sample submissions to augment your unique capabilities.
Goal: make it easier to think for yourself than to offload thinking by default.
The Literature Review I’m Writing
I. Theoretical Foundations
AI safety/control (alignment, scalable oversight, constitutional methods). Assumption: stable human judgment.
Gradual disempowerment (e.g., Christiano, Carlsmith): focuses on political/economic power loss; I extend this to cognitive power loss.
Epistemic agency & autonomy: extended mind / distributed cognition are useful lenses, but they often assume offloading is voluntary and reversible.
Skill formation & neuroplasticity: use-dependent pathways suggest that repeated delegation reshapes what we can do unaided.
II. What We Know Empirically (Across Domains)
Direct AI studies: early signs of reasoning and expression weakening under LLM assistance; homogenization of outputs; uplift with coding assistants plus reliance effects.
Automation analogies: aviation skill decay under autopilot; diagnostic complacency with medical decision support; the “Google effect” on memory. Not perfect analogies to writing, but rhyme in important ways.
Organizational effects: when institutions routinize tool-first workflows, rare skills atrophy fastest—the exact ones we need during incidents.
Gap: no cross-domain synthesis focused on evaluator capacity for AI oversight.
III. Measurement (Hard, but Possible)
Constructs: independent argument construction, adversarial evaluation, principle articulation, anomaly detection.
Possible paradigms: blinded “find the flaw” tasks, writing-to-learn assessments, longitudinal drift tracking, and AI-free dark runs inside oversight teams.
A simple, high-stakes test: after six months of AI-assisted evaluation, can teams still detect deliberately flawed reasoning without AI?
IV. Governance & Institutional Resilience
Procurement and policy that treat epistemic independence like uptime—monitored, budgeted, reported.
Separation of concerns: distinct roles for authorship aids vs. evaluative aids; rotation periods that mandate AI-off critique.
Audit readiness: require “human-in-charge” evidence trails: who reasoned, when, with what support.
What This Means for AI Safety
Every modern control proposal—reward modeling, debate, constitutional AI, scalable oversight—quietly treats human cognitive capacity as a constant. The literature across education, cognitive science, and automation says it isn’t. If we want real control, we must engineer for human robustness the way we engineer for model robustness.
How You Can Help
Pilot: Are you an educator, editor, policy analyst, or technical leader who wants to test Authorship AI? Reply or comment.
Share data (safely): If you run writing programs or eval teams and can support measurement, let’s talk.
Advising/reading list: If I missed research you love—send it. I’m compiling a bibliography.
Thanks again to Cosmos Institute and FIRE for the support—and to Andy Neuschatz for advising. We’ve got 90 days to ship a meaningful MVP. Let’s build tools that keep humans in the loop by keeping humans sharp.
— Ross
Love this idea & would be very happy to trial this
I spend a lot of time writing (author of AI safety South Africa newsletter - so I guess a lot of overlap)
All the best with the project!!