PhD student solves a mysterious ancient Sanskrit text algorithm after 2,500 years

A Cambridge Ph.D. student has solved a grammatical problem that has befuddled Sanskrit scholars since the 5th century BC.

By Kevin Dickinson

July 31, 2023

Around 350 BC, the scholar Pāṇini composed a grammatical treatise of astounding breadth and comprehension. Known as the Aṣṭādhyāyī, it contained 4,000 sūtras, or rules, for writing classical Sanskrit. The extensive work also distinguishes between how the language should be expressed colloquially or when reciting sacred texts.

To put that in perspective for a modern, English-speaking audience, the Aṣṭādhyāyī is what you would get if William Strunk Jr. and E.B. White’s The Elements of Style was an eight-volume tome designed to not only dole out advice but offered meticulous instruction for crafting any word or sentence in the language. (Fortunately for English 101 students the world over, Strunk and White’s efforts were less extensive.)

Although Pāṇini had his predecessors, his work is the oldest surviving example of a complete linguistic text. Its influence on modern linguists, such as Ferdinand de Saussure, has led some to honor Pāṇini as the “father of linguistics.”

Not long after Pāṇini produced his masterwork, scholars did what they do best: They argued over how to best interpret it. The intervening centuries of scholarly debate generated a convoluted lineage of explanations for understanding the Aṣṭādhyāyī — and in doing so, how we should read classical Sanskrit.

More than 2,500 years later, a University of Cambridge Ph.D. student, Rishi Rajpopat, may have solved the mystery of how to use Pāṇini’s language machine accurately.

An ancient language machine

Before discussing Rajpopat’s solution, it’s worth considering what exactly is meant by “language machine.” Pāṇini’s treatise isn’t meant to just describe classical Sanskrit or prescribe usage like a fussy language arts teacher. It’s designed to help its readers generate “flawless” Sanskrit.

Although the Indian subcontinent has produced many writing systems — historian S t even Roger Fischer calls it “the world’s richest treasury of scripts” — it also maintains a strong oral tradition. The Brahmins, India’s priestly class, long considered speech superior to writing. Even centuries after the Indians adopted writing, the Brahmins continued to pass down the hymns of the Vedas, Hinduism’s oldest four sacred texts, orally.

This tradition meant the desire to use language correctly wasn’t simply a matter of avoiding an embarrassing correction in the comments section. It was of the utmost religious consequence.

With the Aṣṭādhyāyī, Pāṇini didn’t only want to teach the language. He aimed to protect Hinduism’s sacred texts from corruption. He built a system of operations that allowed a reader to combine base words and affixes to generate proper word forms. The user could then combine those words to construct impeccable sentences. All readers had to do was follow the instructions.

Effectively, he created an analog algorithm that would produce the same Sanskrit word, sentence, and hymn each time — preserving the language and settling disputes over correct usage for time immemorial.

An enigma in the machine

If only.

Unfortunately for later scholars, Pāṇini’s style is succinct, economical, and intricately self-referential. Preceding rules affect later rules in the operation but without explicitly stating so. As Rajpopat writes in his Ph.D. thesis: “There is no universal convention as to which terms are supposed to or can become [continued] into a certain rule.”

These stylistic choices make the Aṣṭādhyāyī shorter and easier to memorize than it would be otherwise — some historians believe it was initially composed orally — but also incredibly dense. That density leads to rule conflicts, in which two rules may apply simultaneously to the same word yet produce different outcomes.

Pāṇini did provide a meta-rule to solve such conflicts. According to traditional scholarship, this meta-rule states that “in the event of a conflict between two rules of equal strength, the rule that comes later in the serial order of the Aṣṭādhyāyīwins.”

Seems simple enough. But when applied, this meta-rule yields many exceptions. To correct those exceptions, scholars have for centuries created their own meta-rules. However, those meta-rules yielded even more exceptions, which required the creation of additional meta-rules (meta-meta-rules?). Those meta-rules in turn created even more exceptions — and you see where this is going.

One meta-rule to rule them all

While working on his thesis, Rajpopat decided that this meta morass could not have been what Pāṇini intended with the Aṣṭādhyāyī. “Pāṇini had an extraordinary mind and he built a machine unrivaled in human history. He didn’t expect us to add new ideas to his rules. The more we fiddle with Pāṇini’s grammar, the more it eludes us,” Rajpopat said in a press release.

Rajpopat returned to Pāṇini’s original meta-rule and considered it with a fresh lens. When he found a rule conflict, rather than give preference to the later rule in serial order, he deferred to the word that came later in the sentence. Because Sanskrit is written left to right, that meant applying the rule appropriate to the word on the right.

An example shared by Rajpopat is the sentence “devāḥ prasannāḥ mantraiḥ” (“The Gods are pleased by the mantras”). Rajpopat notes that a rule conflict arises when trying to derive the word mantraih (“by the mantras”). One rule applies to the left word, mantra, and another to the right word, bhis. By applying his interpretation of the meta-rule, he followed the rule for bhis and arrived at the correct form of mantraih.

Rajpopat’s research looked at many other rule conflicts, and thorough textual evidence shows that the meta-rule results in the correct form time and again. This potentially makes it possible to construct millions of grammatically correct classical Sanskrit texts.

“[Rajpopat] has found an extraordinarily elegant solution to a problem which has perplexed scholars for centuries. This discovery will revolutionize the study of Sanskrit at a time when interest in the language is on the rise,” Vincenzo Vergiani, a professor of Asian and Middle Eastern studies at Cambridge, said in the same release.

From language machine to learning machines

Sanskrit isn’t only the sacred language of Hinduism. Throughout India’s history, works of poetry, philosophy, mathematics, and literature have been written in the script. By solving this grammatical problem, Rajpopat has given modern scholars a fresh means to interpret and understand this wealth of human achievement.

And now unlocked, Panini’s ancient algorithm could also potentially be taught to computers. This may not only improve the accuracy of modern translations but dramatically increase the speed at which such scholarship can be undertaken.

“Some of the most ancient wisdom of India has been produced in Sanskrit, and we still don’t fully understand what our ancestors achieved. We’ve often been led to believe that we’re not important, that we haven’t brought enough to the table. I hope this discovery will infuse students in India with confidence, pride, and hope that they too can achieve great things,” Rajpopat said.

This article was reprinted with permission of Big Think, where it was originally published.