You’re a Data Person: Building Data Literacy

You’re a Data Person: Building Data Literacy

A few years ago, I interviewed for my first job in student affairs assessment and to start the conversation one of the interviewers said to me, “So…you’re a data person.”  At the time, I did not know how to respond. Certainly while I was in graduate school I developed a strong background in quantitative research methods and statistics, but before that I got a theatre degree.  My methods training was not something I saw as a defining characteristic. The methods were simply tools that I used to answer questions that I found interesting.

As I started to gain professional experience though, I continued to hear that phrase from colleagues in different departments and from colleagues at different institutions.  You’re a data person. In nearly every case it was meant as a compliment, but I began to realize that this notion was reinforcing a false dichotomy. Certainly as an assessment professional, I spent more time thinking about data and statistics than most of my colleagues.  With that said, all of my colleagues, regardless of their background wanted to know more about the students they were serving. In a sense, they were all “data people” whether they knew it or not.

As I began to dig deeper, I saw that in order to advance our assessment practices we needed to build data literacy.  If our stakeholders did not see themselves as having data expertise and they were not getting clear, actionable insights from their data collection efforts, then how could we expect them to see data collection as anything other than a distraction from their day to day responsibilities?  This is not to say that everyone needs to know how to build a predictive model or design a database, but we do need to provide our colleagues with simple guidelines and resources so they are comfortable with the basics of data collection and analysis.

It’s one thing to say we need to build data literacy, but how do we go about doing it?  How do we provide our colleagues with a path forward? Ultimately, we need to build a framework that is not tied to a specific workflow, data tool, or platform.  Instead, we need to build a framework that shows our stakeholders their role in the bigger picture and gives them actionable information that they can use in their day to day work.  So how do we begin?

  1. Begin with Data Integration

Often times, data integration is an afterthought.  Instead, we are given a directive, we collect exactly what we need, we report the results, and we never consider how the data fits into the bigger picture.  Eventually, we begin to think about data integration when we’ve reached the point where manual data collection has become too tedious and we’re seeking a technological solution to the problem.  In most instances though, by the time we’ve reached this point, our data are messy and not structured in a way that we can easily leverage.

One of the essential elements of building data literacy is to understand how our data integrates into the broader data infrastructure.  Again this does not require that we build a database, but it does require that we store our data in a way that it can easily be manipulated, joined, and transformed for analysis.  This means being intentional about the unique identifiers that we collect, understanding the level of aggregation, and working to implement a tidy data structure.

  1. Data Provenance

Even those with a passing interest in painting or sculpture are familiar with the concept of art provenance.  If we are able to trace a work through all of its past owners and back to the artist that produced it, it helps us verify its authenticity.  The provenance of a piece can also add to its value.

Provenance has the same value when it comes to data.  If we can trace our data back to the original source, it provides us with confidence in our analysis and the inferences we draw from it.  Achieving this requires that we understand the data generating process and know when the data will change. Furthermore, it requires that we have clear standards for when we freeze the data for analysis so we understand precisely what our analysis means.

  1. Replicable Analysis

Often times when we conduct a data analysis, it follows a very similar pattern.  We get the raw data and we dive right in. We then clean it and transform it into the structure that we need, we run our model, produce our graphs and tables, and report our results. Then a few months go by and a conference presentation comes up or an administrator asks for a deeper dive into the results.  All of a sudden, we have to return to the analysis.

It’s at this point where we see the value in conducting a replicable analysis. If we have a clear way of tracing back through our steps, we gain confidence in our analysis and know exactly how we reached our conclusions.  Furthermore, we can easily update and enhance the analysis as necessary. In many ways, replicability goes hand in hand with data provenance because if we can trace the data back to its source and we know all of the steps we took to conduct the analysis we always reach a stable conclusion.  

  1. Understanding the Burden of Proof

When we begin a data collection effort, in many cases we only have a vague sense of what we ultimately want to learn from the data.  Certainly, there is information that needs to be collected purely for documentation purposes. With that said, if we want to learn something from the data we need to know what question we want to ask and what evidence will convince us.  Knowing the question ahead of time and having a clear burden of proof gives us clear insights into what data we need to collect and what data we need to join our information with in order to conduct a convincing analysis.

As we grow as assessment professionals, our work naturally becomes more nuanced and methodologically rigorous.  This is exciting because it means that we’re not only developing as a discipline, but that we’re able to gain deeper insights about our students and the questions we care about.  To continue to progress though, we need to help build data literacy on our campuses. This requires providing our colleagues with clear standards and practical skills that help them process the data that they encounter.  It also requires an emphasis on the broader context, so each person understands how they contribute to the broader data infrastructure.

Most of our colleagues will never construct a complex model or design a relationship database.  In today’s world though, we constantly encounter data. We need to provide our colleagues with the resources to succeed and flourish in this environment.  After all, they’re “data people” too.

Eric Walsh, University at Albany
*Hadley Wickham has done a great deal of work on developing the notion of tidy data.  His article on the concept can be found at:

Go Back


Blog Search