Money ain’t got no owners – only spenders.

Omar, The Wire

Recent advances in AI1 text-generation have produced systems with a dazzling range of powers. They can generate all sorts of text, everything from poetry to marketing copy to actual, working code. The text isn’t always perfect, but boy oh boy, there sure is a lot of it! AI text-generators can produce mostly-plausible text at a scale that is absolutely out of the question for humans. Sooner rather than later, a lot of text in our world is going to be machine-generated.

What role is there for humans in this future?

One hope is that there will be a market for boutique, artisinal, human-generated text. Such text would be a luxury good, since it cannot be produced at machine scale. Human-generated text will have that “special touch” that cannot be replicted by AIs, that could only be produced by humans with their creativity, perspective, experience, being-in-the-world, etc.

In the short term, this is not implausible. After all, anyone who has spent any time with these systems knows that the text they produce is often “not quite right”. It looks right superficially, but on closer inspection it can be nonsensical or false in bizarre ways. This problem will no doubt be fixed up over time, but it is likely to persist for a while.

In the long term, I think the hopes for human-generated text are doomed. This is because of the simple logical fact that there is no such thing as human-generated text.

This is not to say that humans don’t generate text. Of course they do! But for a given piece of text, the fact of its having been generated by a human is a matter of historical contingency, and not a property inherent to the text itself.

Text is nothing other than a finite sequence of words. Given a fixed length of text and a fixed vocabulary, there are only finitely many sequences of words. For example, consider a text that is no more than 100 words long, with a vocabulary limited to the 1,000 most commonly used English words. There are no more than 1001000 such texts. That is a lot of texts, but still there are only finitely many. With unlimited time and resources, every such text could enumerated in a long, long list. Such a list would contain everything that could possibly be said within the given bounds.

Suppose you had access to such a list and you needed to generate some text. It would only be a matter of searching the list until you found the “right” entry on the list, the text that says what needs to be said. That would be one way of “producing” text.

In reality, such a list could not possibly exist. And even if it did, it would take way too long to search. Instead of exhaustive enumeration and search, humans generally use cognition to generate text. The exact mechanisms for this process are not clear, but humans seem to have a sense of relevance and purpose for text with respect to their sense of the world. Humans (mostly) use meaning to generate text. AI text-generators use a very different process. Instead of looking at relevance and purpose and meaning, they take a bird’s-eye view of a large corpus of text and crunch numbers to generate text that is statistically similar.

But in any case, the text that is generated always already exists, in the sense that it could have been enumerated. And from this perspective, human text-generation and machine text-generation are merely different means of lighting upon the the right text for a given situation. For a given situation, the right text is what it is, and once in hand, its method of discovery is not just irrelevant, but in general impossible to discern.

Discussion Questions

  1. Will AI text generation always be subject to producing bizarre output?
  2. This post discusses three methods of generating text: human cognition, statistical machine learning, and exhaustive search. What are some other methods?
  3. Does anybody really care about the provenance of the text they consume? Or is the relationship purely instrumental?
  4. How is it possible for text to “sound like” a particular person? Could that property of “sounding like” someone be copied and reproduced?
  5. This post describes text as a “sequence of words”. Can the notion of “text” be extended to include things like music?


1 “AI” stands for “artificial intelligence”. Despite their amazing capabilities, it is doubtful that any current AI systems are actually “intelligent” in any meaningful sense. So really, “AI” should be in “scare quotes” throughout this post. That would be kind of annoying to read though, so they aren’t included. But the “scare quotes” should be imagined to be there.