February 09

The myth shattered: when using AI isn't enough

alex dantartBy alex dantart

For months we repeated a calming mantra: "AI will not replace you, someone who uses it will." Today I have to express something uncomfortable: That phrase is now obsolete..

And it becomes even clearer after analyzing an empirical study with 2.700 real cases that demonstrates something many of us suspected but no one had measured: Simply using AI is not enough. We need to know what kind of AI to use, or the cure will be worse than the disease..

The experiment that changes everything

Twelve of the best models on the market (GPT-5, Claude, Gemini, etc.) were evaluated by drafting 75 real legal tasks: appeals, objections to precautionary measures, and jurisprudential reasoning. Three different scenarios were used:

  1. Pure AI: by asking one of those generic models directly.
  2. AI with basic sources: passing through a private corpus beforehand.
  3. AI with advanced verification: with private corpus and verification techniques.

The results are devastating, and could give more than one person sleepless nights. When a lawyer asks ChatGPT or Claude to draft an appeal "from memory," almost 3 out of every 10 quotes it generates are fakeNon-existent rulings, incorrect attributions, fabricated legal doctrine. A 26.8% error rate in citations and a 15.6% error rate in fabricated facts.

Does that sound like an exaggeration? Perhaps not so much for that lawyer sanctioned by the Constitutional Court for submitting 19 nonexistent citations. Or for the Italian and Argentinian colleagues recently reprimanded for the same thing.

But there's more: each document generated in this way It requires an average of 35 minutes of review time. to correct it, so it's not a useful draft, but an information liability that wastes more time than it saves.

The solution is not to stop using AI

Here's the twist: when those same AIs work based on verified sources (which is achieved with RAG technology), the error rate drops to 8.3%. And with advanced verification systems, practically disappears: 0.046%Furthermore, the review time is reduced from 35 minutes to… 1.2 minutes.

The difference isn't in "using more AI." It's in understanding that there are two radically different types of legal AI:

The creative oracle (pure generative AI) that:

  • Gives fluent and convincing answers
  • He makes things up when he doesn't know.
  • Their goal is consistency, not truthfulness.
  • It's like hiring someone brilliant... but a pathological liar

The expert archivist (consultative AI) who:

  • Search first, synthesize later
  • Cite verifiable sources
  • Admit it when you can't find something
  • It's like having a meticulous collaborator who takes notes on everything.

Why this changes the professional paradigm

Let's return to the initial idea: "You'll be replaced by someone using AI." Well, no. Because By the end of the year we'll probably all be using itWord will incorporate AI as standard, legal databases will integrate it, courts will implement it, and legaltech solutions thrive on it. The real divide won't be between those who use AI and those who don't. It will be between:

  • Strategy architectsProfessionals who understand what type of AI is needed for each task, who know how to audit, verify, and manage it. Who use consultative technology to accelerate research, but keep human judgment at the center.
  • The task operatorsProfessionals who delegate thinking to the machine, assuming that if it sounds good, it must be correct. They become trapped in a "generate-review-correct" loop that destroys the efficiency they were supposedly seeking.

Three questions we should ask ourselves today

  1. Does the tool I'm using tell me where it gets its information from? If you can't verify each claim in 10 seconds, you're using an oracle, not a filing cabinet.
  2. 2. Do I spend more time correcting than I earn generating? If so, the paradigm being used is broken.
  3. 3. Is my job becoming a to-do list or a chain of decisions? Because the tasks become cheaper, but the decision becomes more expensive.

The only valid professional insurance

This study demonstrates something revealing: Digital competence is no longer enough. Need architectural competitionKnowing when we need creativity and when we need absolute rigor. It's not about being afraid of AI. It's about being afraid of misuse.

Because when the Constitutional Court issues a ruling, it doesn't ask what model you used. It asks why it wasn't verified. And "It was generated by AI" It is not a defense but rather an aggravating factor.

The profession will not be decided by who uses AI, but by who understands what type of AI is needed at any given time.

And in whom does he preserve something that no machine can replicate: the ability to put their name and responsibility behind every decision.

Note: The complete "Reliability by Design" study with the 2.700 cases analyzed is available at arxiv.orgIt includes the JURIDICO-FCR dataset for replication.

(Link: https://arxiv.org/abs/2601.15476)

Share: