Archaeology as a Data Science

My former workplace.

My former workplace.

I remember having a disagreement with a friend when I was getting my Master’s degree in archaeology. I had mentioned something about the scientific process as it applies to archaeology, and he interrupted me to say that archaeology was a form of history, not a science. And while the purpose of archaeological research is to gather information on historical occurrences, the modern archaeological method is a highly scientific process, relying on careful data collection and interpretation. 

What do you think of when you hear the word “archaeologist?” If you’re like most people, an image of Indiana Jones probably comes to mind: someone who travels around the world collecting objects from ancient places. And in the abstract, that’s not too far off from what archaeology used to be like. Massive excavations from a hundred or more years ago uncovered such important sites as Babylon, King Tut’s tomb, and Pompeii. What has changed, however, is the method.

This old-school approach, where teams of hundreds of workers would dig down over an entire area until they revealed wide swaths of the site, was really effective at finding objects. So if your goal is to answer the question "what kinds of objects did the Babylonians use?,” you’re going to get an answer. However, this is ignoring one of the most important tools an archaeologist has at their disposal: context. 

Imagine that you are an archaeologist, and someone brings you an ancient Roman statue that they dug up somewhere, wanting to know more about it. “How old is this?,” they might ask. That’s actually a really difficult question to answer, and likely impossible to answer in this situation. Carbon dating only works on organic material, so you can’t use this process.

So how do archaeologists determine things like the age of an object, or the date it was buried? By studying the context. Think of the Grand Canyon: one of its most stunning features is the different bands of color running through the canyon walls. As we know from geology, this is called stratigraphy, and helps us to understand the history of the site. Layers on the bottom represent what happened the furthest ago, and everything on top of that represents a new period of geologic activity. Well, this applies to archaeological research, too: as we dig further down into the dirt, the quality of the soil changes from time to time. 

The difference between stratigraphy as it applies to archaeology and geology, is that archaeologists are actively destroying their stratigraphic context as they dig deeper down. Because of this, modern excavation has moved strongly towards careful data collection, and away from the old-school approach of “dig a massive hole in the ground.”

As you dig down, you start to notice the quality of the soil is changing. This is an important sign to the archaeologist that the time period is changing. Go back to the above example of the Roman statue. If you discover this in the field, you can use the data that you have been collecting to help answer questions about its age, and how long it has been buried. 

Maybe you found the statue in the same layer of dirt that you found a few coins. Coins are really helpful in Roman imperial archaeology, because they almost always feature an image of the Emperor at the time, which is something that we have a strong historic record of. So let’s say that the coins you found in this layer feature the portrait of Domitian, who was emperor from 81 to 96. This tells us something: it’s impossible for the statue to have been buried before this time period. 

As you dig down, you’re effectively going backwards in time. And because you don’t know what you’re going to find when you start digging, you have to be really diligent with your data collection from the very beginning. It’s only at the end of a dig season, when you’ve found all the objects that you’re going to excavate, that you take all the data you’ve collected and do this interpretation. You have no idea what today’s data might reveal about tomorrow’s discovery.

The flip side of this, however, is that you end up collecting massive amounts of data that you end up not really using. When I was last in the field, 8 years ago, we were still doing all of this data collection with a pencil on a bunch of standardized forms. I got really good at looking at these datasets and getting a feeling for the story the data was telling, but I realized that there had to be a way to use computers to interpret all of the data, and not just the small subset of the data that I would end up using. 

I never got a chance to do this, as I decided to leave the Academy after finishing my Master’s. But people are starting to develop these archaeological tools. Here’s a great NYT article about this from a few months ago.

After I pivoted my career into Tech, I noticed that the skills I had learned as an archaeologist were helping me in my job. I was doing a bunch of support engineering-related tasks, trying to figure out just how customers had broken our platform, and I would end up just reading the raw JSON logs to get a feeling for how people were using the software. And that’s when I realized that I had been doing a kind of manual data science all along: looking at large datasets and using them to interpret historical patterns. 

That got me really excited, and I started to pivot a lot of my daily work to product analytics, which I find really personally satisfying. I may not be doing Roman archaeology, but I am doing a form of data archaeology. Eventually I realized that this is what I want to do with my career, and here I am now, getting my certification in Data Science from General Assembly. 

Data is everywhere, and who knows what stories it can tell us until we look?