Vibe Coding, GPL and Copyright: The Legal Bomb Inside AI Code

Thomas de Grivel posted an ext4 implementation to the openbsd-tech mailing list — built entirely with ChatGPT and Claude, without reading a single line of Linux source code.

The driver seemed to work within its declared scope: it provided read-write access, passed e2fsck, but did not support journaling. Interesting enough to discuss. Not enough to survive the problem OpenBSD actually cared about: provenance.

A few days later, Theo de Raadt ended the discussion with a blunt sentence: “the chances of us accepting such new code with such a suspicious Copyright situation is zero.”

De Raadt was taking a position on the question the industry keeps trying to avoid: what happens when vibe coding meets GPL code, copyright, and open source licensing? The quality of the driver was secondary.

What De Grivel Actually Built

De Grivel was explicit in his blog post about how the driver had been produced: “It’s pure AI (ChatGPT and Claude-code) with careful code reviews and error checking and building kernel and rebooting/testing. No Linux source files were ever read.”

That last sentence is meant to establish a clean chain of innocence: no Linux source files read, only AI, manual review, kernel builds, real testing. But for OpenBSD, that was not enough.

The issue was not only what de Grivel had read. It was what the tools he used may have absorbed.

Both ChatGPT and Claude were trained on enormous corpora of source code and technical text, very likely including code and documentation related to ext4 in the Linux kernel: GPL v2 material. When an LLM generates an ext4 driver, it is not simply reinventing the filesystem from first principles. It is producing code from a statistical model built from existing examples, documentation, idioms, and implementation patterns.

Whether that reconstruction is transformative enough to avoid derivative-work status is the question no court has answered. The OpenBSD team did not have an answer either — and that absence was already enough to refuse the code. Bryan Miller summarized the position plainly: “We don’t know. The law hasn’t caught up to the technology yet and we can’t take the risk.”

Why AI-Generated Code Creates a Copyright Problem

The standard vibe coding defense goes like this: I did not read the GPL source, so I cannot have created a derivative work.

That logic works reasonably well for a human developer who studies public documentation, understands the problem, and then writes original code. It does not clearly map to a model trained on billions of tokens of licensed software.

Brian Kernighan could write an ext4 implementation without ever reading the Linux source, and nobody would seriously claim that the result was automatically contaminated. He understands filesystems. He applies his own knowledge.

An LLM does not bring personal expertise to the problem. It brings a probability distribution built from examples, documentation, code, and patterns seen during training. That is exactly why the legal boundary is hard to draw.

De Grivel himself used a phrase that exposes how immature the language around vibe coding still is: “We can freely steal each other in a new original way without copyright infringement.”

The word steal is doing more work than intended. In everyday developer culture, it usually means learning, borrowing, adapting, riffing. In a copyright dispute, it lands differently. The problem is not that vibe coding is theft by definition. The problem is that the workflow makes it harder to prove where influence ends and derivation begins.

What This Means for Developers Shipping Open Source

If you use AI to write code that will ship under BSD, MIT, or Apache, this case is about you. Not abstractly. Operationally.

The copyright question around AI-generated code is not the first warning signal. The security problems with AI-generated kernel contributions have been visible for a while. But copyright adds a dimension that code review does not catch.

A patch can be correct, pass every test, and still carry a provenance problem that will not surface until a lawyer, a customer, or a compliance team starts asking where the code came from.

Every BSD-licensed project that accepts AI-generated contributions without a provenance policy exposes itself to legal risk that is difficult to quantify. Organizations with strict compliance requirements — governments, publicly traded companies, software embedded in regulated environments — cannot treat “AI wrote it” as a sufficient answer to a licensing question.

The model provider may offer contractual protection, but that does not automatically solve the provenance problem. Microsoft, for example, introduced its Copilot Copyright Commitment for some commercial services and under specific conditions. That is contractual coverage. It is not a court ruling on the legal nature of every generated output.

The risk does not disappear because the prompt was clean.

TechMonk’s Take

Theo de Raadt is right.

OpenBSD’s position is not paranoia. It is the only coherent answer available under current legal uncertainty. The project’s entire value proposition depends on license clarity. Accepting code with ambiguous provenance would weaken that guarantee in exchange for a useful driver, but not a driver valuable enough to justify a crack in the project’s chain of trust.

GPL May Become the First Real Legal Test for AI Code

Copyleft was built to propagate: any work derived from GPL code must remain under GPL-compatible terms. That is the mechanism, and it is deliberate. The FSF did not anticipate language models in 1989, but the principle has not changed.

If an LLM absorbed the Linux kernel’s ext4 implementation and related materials to build a statistical distribution over code, and that distribution later generated a working ext4 driver, the derivation question starts to look structurally familiar: how transformative does a work have to be before it escapes derivative status?

No court has answered that for LLM-generated code.

But GPL could become the instrument that forces the question. Not because it was written for AI. Not because the FSF designed a trap for language models. But because GPL was designed to resist the silent removal of reciprocity from free software.

If a court ever recognizes certain AI outputs as derivative of GPL code, that old legal architecture suddenly becomes very current. The models did not walk into a trap built for them. They walked into a perimeter that already existed.

The Real Bomb Is Permissive Licenses

Everyone focuses on GPL. The more insidious risk may be MIT and Apache.

Those licenses allow broad commercial use, but they still impose conditions. MIT requires preservation of copyright notices and permission notices. Apache 2.0 also carries notice and patent-related obligations. These are lighter than copyleft, but they are not nothing.

Language models have absorbed enormous amounts of permissively licensed code. When a model reproduces a pattern, a helper function, a parser, a compatibility routine, or a test structure inspired by that code, attribution does not automatically appear anywhere: not in the commit, not in the documentation, not in the API response.

GPL protects contributors through reciprocity. MIT and Apache rely much more heavily on attribution and notice preservation. But if the intermediary is a statistical model that does not track sources, the attribution chain disappears inside the process.

That does not mean every AI-generated snippet is infringing. It means permissive-license risk is harder to see. A copied GPL file is visible. A missing MIT notice buried inside generated glue code is not.

The OpenBSD case is the visible tip of a problem measured not only in famous copyleft projects, but in billions of lines of permissively licensed code whose authors may never be credited when their work reappears through model output.

The First Court Case Probably Will Not Be Random

This ambiguity is unlikely to be resolved by a neutral, accidental lawsuit.

The first truly decisive case will probably be selected, funded, and framed carefully: by a model provider looking for a favorable precedent before legislation hardens, or by an open source foundation trying to define GPL’s reach before others define it for them.

The industry is not waiting passively. It is watching for the most advantageous case. Model providers have legal teams monitoring every relevant dispute. Open source foundations are gathering evidence and arguments. The developer who uses AI today to write code shipped under a permissive license may be, unknowingly, part of someone else’s future legal strategy.

“AI wrote it” is not a defense. At best, it is an explanation. At worst, it is a provenance gap with a vendor logo attached.

Conclusion

The Linux kernel took decades to build a review process strong enough to resist buggy, malicious, and legally ambiguous contributions. Vibe coding does not necessarily bypass that process out of malice. It bypasses it through misplaced confidence that “AI wrote it” is a statement of origin rather than a hole in the origin story.

The open question is how many BSD, MIT, and Apache codebases already contain AI-generated code that would become uncomfortable under serious legal scrutiny: not because every output is infringing, but because nobody can prove what it is derived from.

Nobody is auditing for this at scale. Not yet.

When the first serious case reaches a courtroom, the industry may discover that the problem was never theoretical. It was merely deferred.