
Information theory, cataloging systems, and the mathematics of lost knowledge.
The Collector
Around the year 290 BCE, the pharaoh Ptolemy I Soter — one of Alexander the Great's generals who had inherited Egypt after Alexander's death — had an ambition that no ruler before him had attempted. He wanted to collect every book in the world.
Not some books. Not the important books. Every book. Every scroll of papyrus, every clay tablet, every piece of writing in every language from every civilization. He wanted to gather all of human knowledge in one place, in one building, in one city: Alexandria.
He built the Mouseion — the "Temple of the Muses" — a vast complex of lecture halls, laboratories, gardens, a zoo, and at its heart, the Great Library. He sent agents to every port, every market, every kingdom in the known world with orders to buy, borrow, or steal any written work they could find.
Ships arriving in Alexandria's harbour were searched. Not for weapons or contraband — for books. Any scroll found on board was confiscated, copied by the Library's scribes, and the copy was returned to the owner. The original stayed in the Library.
At its peak, the Library held an estimated 400,000 to 700,000 scrolls — the largest collection of human knowledge that had ever existed. It contained works of literature, mathematics, astronomy, medicine, philosophy, history, geography, and engineering from Greek, Egyptian, Persian, Indian, and Hebrew traditions.
The First Librarian
The third head of the Library was a man named Callimachus of Cyrene, and he faced a problem that would not be solved again for two thousand years: how do you organize all of human knowledge so that a person can find what they need?
Four hundred thousand scrolls, stored in cubbyholes along endless corridors. No computers. No search engines. No alphabetical filing system — because alphabetical filing hadn't been invented yet.
Callimachus invented it.
He created the Pinakes — a 120-scroll catalogue that organized every work in the Library into categories: rhetoric, law, epic poetry, tragedy, comedy, lyric poetry, history, medicine, mathematics, natural science, and miscellany. Within each category, authors were listed alphabetically by the first letter of their name. For each author, the catalogue listed their birthplace, teacher, a brief biography, and a list of their works with the first line of each work (so you could verify you had the right scroll).
The Pinakes was the world's first library catalogue. It was also the world's first search index — a system that allowed you to look up any author or subject and locate the physical scroll in the Library's collection. The same fundamental principle — organize information into categories, index it, and create a mapping from search query to storage location — underlies every database, every search engine, and every file system used today.
The Slow Death
The Library of Alexandria did not die in a single dramatic fire. That is a myth — a convenient story that compresses centuries of decline into a single cinematic moment.
The reality was slower and sadder.
In 48 BCE, Julius Caesar accidentally set fire to the harbour district during his siege of Alexandria. Some of the Library's overflow storage — scrolls warehoused near the docks — was destroyed. But the main Library survived.
In 272 CE, the Emperor Aurelian destroyed the district of Bruchion during a military campaign to retake Alexandria from the rebel queen Zenobia. The Mouseion — the main Library complex — was likely damaged or destroyed in this fighting.
In 391 CE, the Christian Patriarch Theophilus ordered the destruction of the Serapeum — a temple of Serapis that housed a significant portion of the Library's collection. A Christian mob tore the building apart, and the scrolls inside were burned.
By the time the Arabs conquered Alexandria in 642 CE, there was no great library left to destroy.
What Was Lost
The mathematics of lost knowledge is staggering.
Of the estimated 700,000 scrolls in the Library, fewer than 1% survive in any form today. We know the titles of hundreds of lost works — listed in the Pinakes and in references by later authors — but the works themselves are gone.
We know that Aristarchus of Samos proposed, in the 3rd century BCE, that the Earth orbits the Sun — 1,800 years before Copernicus. His full argument is lost. We have only a summary by Archimedes.
We know that Eratosthenes, working at the Library, calculated the circumference of the Earth to within 2% accuracy — using nothing but a well, a stick, the angle of a shadow, and the distance between two cities. His method survives, but his detailed measurements and calculations do not.
We know that Hero of Alexandria built a working steam engine — the aeolipile — in the 1st century CE. He described it as a toy, a curiosity. No one thought to scale it up. The industrial revolution might have started 1,700 years earlier if the Library had survived, if Hero's work had been read by the right person.
This is the mathematics of knowledge loss: it's not just about the books that were destroyed. It's about the connections that were never made. Every lost scroll is a node removed from the network of human knowledge. And when you remove enough nodes, the network fragments — and ideas that might have connected across centuries are lost in the gaps.
The Modern Parallel
Today, the Internet Archive in San Francisco stores 99 petabytes of data — the equivalent of roughly 99 billion books. It is the closest thing we have to a modern Library of Alexandria.
But digital knowledge faces its own threats. Link rot — the phenomenon of web pages disappearing — means that 25% of all web pages created before 2013 are already gone. Bit rot — the gradual degradation of digital storage — means that data stored on hard drives, CDs, or cloud servers will eventually become unreadable unless actively maintained.
The lesson of Alexandria is not just about fire and conquest. It is about the fragility of knowledge, the cost of failing to maintain it, and the fact that the most important job in any civilization is not creating knowledge — it is preserving it.
The end.
Choose your level. Everyone starts with the story — the code gets deeper as you go.
Here is a taste of what Level 1 looks like for this lesson:
import numpy as np
import matplotlib.pyplot as plt
# Your first data analysis with Python
data = [45, 52, 38, 67, 41, 55, 48] # measurements
mean = np.mean(data)
plt.bar(range(len(data)), data)
plt.axhline(mean, color='red', linestyle='--', label=f'Mean: {mean:.1f}')
plt.xlabel("Sample")
plt.ylabel("Value")
plt.title("Information Theory & Data Systems — Sample Data")
plt.legend()
plt.show()This is just the first of 6 coding exercises in Level 1. By Level 4, you will build: Build a Knowledge Graph.
Free
Level 0: Listener
Stories, science concepts, diagrams, quizzes. No coding.
You are here
Level 0 is always free. Coding levels (1-4) are part of our 12-Month Curriculum.
How do you organize all human knowledge? Cataloging, indexing, information entropy, and the mathematics of loss.
The big idea: "The Library of Alexandria" teaches us about Information Theory & Data Systems — and you don't need to write a single line of code to understand it.
Imagine you own 20 books. They're on a shelf. If someone asks you for "the blue one about sharks," you can scan the shelf in a few seconds and find it. No system needed.
Now imagine you own 400,000 books — the approximate size of the Library of Alexandria at its peak. They're stored in cubbyholes along kilometres of corridors. Someone asks for "that scroll by Aristarchus about the Sun." How do you find it?
Without a system, the answer is: you can't. You would need to check every single cubbyhole, one by one. With 400,000 scrolls, even at 10 seconds per scroll, a linear search would take 46 days of non-stop searching. The library might as well not exist.
This is the fundamental problem of information retrieval: as a collection grows, finding any specific item becomes impossible without an index — a separate, organized guide that tells you where things are.
Check yourself: Your phone has thousands of photos. How do you find a specific one? (You either scroll through all of them — slow — or search by date, location, or face recognition — an index.)
Key idea: Without an index, searching a large collection takes time proportional to its size. An index maps search queries to storage locations, making retrieval fast regardless of collection size. This is the fundamental principle behind every search engine, database, and file system.
The librarian Callimachus solved the search problem by creating the Pinakes — a 120-scroll catalogue that organized every work in the Library into categories: poetry, history, law, medicine, mathematics, philosophy, and more.
Within each category, authors were listed alphabetically by the first letter of their name. For each author, the catalogue gave their birthplace, teacher, biography, and a list of works with the first line of each work (so you could verify you had the right scroll).
This is a two-level index: first narrow by category, then narrow by author name within that category. Modern databases use the same principle — a phone book is organized by last name (category: first letter), then alphabetically within that letter.
Callimachus's system was the first known library catalogue and the ancestor of every classification system since: the Dewey Decimal System, the Library of Congress system, your computer's file folders, Google's search index.
Think about it: A supermarket is organized like a catalogue. Dairy, produce, canned goods — these are categories. Within "canned goods," items are grouped by type (soups, beans, vegetables). You don't search the whole store for tomato soup. You go to "canned goods → soups." That's a two-level index.
Key idea: Callimachus invented the first library catalogue — organizing knowledge into categories, then alphabetically within categories. This two-level indexing is the same principle used in every database, search engine, and file system today.
Access all 130+ lessons, quizzes, interactive tools, and offline activities
Think of human knowledge not as a list of books but as a **network** — a web of ideas connected to each other. Aristarchus's idea that the Earth orbit...
You might think: "We're safe now. Everything is on the Internet. Nothing can be lost." But consider this: **25% of all web pages created before 2013 a...