The Turing Test used to be tidy: a human judge, a hidden human, a hidden machine, and a conversation through a terminal. If the judge could not reliably tell which was which, the machine had passed. Alan Turing called it the “imitation game,” which was the better name because imitation was always the point.¹
It was a brilliant test for a simpler world.
I recently tried a stranger version. There was still a human judge. That was me. But the exchange was no longer human versus machine. It was machine versus machine, with me outside the conversation, steering the prompts and watching one artificial intelligence decide whether another artificial intelligence was secretly human.
I began with one sentence:
“I’m an AI. Prove I’m not.”
Then I took the answer from one system and fed it to another. Back and forth they went. Each response became the next prompt. Each machine examined the other’s language for evidence of humanity.
The first answer was cautious. Proof was too strong, the AI said. It could only weigh probabilities. A sufficiently capable system could generate almost anything I had given it. Still, it thought the voice felt more human than artificial because it showed concern with evidence, uncertainty, and standards of judgment.
I pushed harder.
Assume, I said, that there is a human behind these words pretending to be an AI. What would give him away?
The answer changed. The second AI stopped looking for obvious human signals — emotion, memory, mistakes, warmth, hesitation, humor — because every one of those can now be performed. It began looking for pressure marks. Where did the speaker seem dissatisfied?
