Hey! Why does my data look different?
By Luis Oquendo, Manager of Analytics: The University Analytics and Institutional Reporting Office
We’ve all been there…meeting with a colleague, browsing UAIR’s fabulous website, or perhaps consulting with an UAIR analyst when you realize: hey, why does my data look different? You double check and triple check your numbers, but counts still don’t match up. It doesn’t make any sense!
What happens next is understandable: the natural reaction is that there’s been some sort of mistake. Admittedly, since the data we deal with comes from humans and is managed by humans, mistakes will indeed occur from time-to-time. We understand this can be frustrating and confusing. In such instances, data architects and data analysts strive to address such mistakes and document them so that they are avoided in the future.
In many other circumstances, however, your data does indeed look different and it’s not a mistake. More importantly, there are many trackable and logical reasons for why your data looks different. The goal is to be aware of those reasons and seek to understand them. Outlined below are many of the common reasons why data and information that is shared across the University of Utah often looks different.
The primary, and perhaps the most important reason why your data looks different pertains to the source of the data. Where did your data come from? For context, let’s use student data as an example. Some departments may query student data directly from PeopleSoft. Additionally, some departments and offices may even have their own internal databases and sources of data. Although there are many sources, there are two increasingly prominent sources of student data that are making their way around campus: UAIR data and Student Data Warehouse (SDW) data. Moreover, UAIR analysts primarily utilize these two sources of data. Once you understand where your data comes from, it also important to have a general understanding of the underlying purposes of such data.
While both UAIR data and SDW data are extremely useful, each source is maintained with different overarching goals in mind. One of the primary goals of UAIR is to maintain data for the purposes of budgeting, mandated external reporting, and other public facing reports. Therefore, UAIR data, processes, and reports are often defined, organized, and presented with respect to the university’s budget process as well as federal/state mandated definitions and guidelines. In contrast, SDW data, processes, and reports are maintained with the following goals kept in mind: the accessibility of personally-identifiable-data, university wide accessibility to the SDW, and day-to-day operational use of data.
Overall, most sources of data can help with regards to planning and data-driven decision-making. Nevertheless, it is crucial to note that different sources of data may differ in terms of definitions, availability, and maintenance processes, all of which are factors that contribute to why your data looks different.
Hey, what’s a snapshot? A “snapshot” is a specific view of what the data looks like at a particular point in time. We use snapshots as a method to consistently and systematically measure data for reporting and operational purposes. Consider, for example, how many students are on campus on a given day. Students come in and out, especially at the beginning of a semester. In order to be consistent from term to term, we take a snapshot of the data at a determined point in time each semester. When it comes to student data, the two snapshots that you will encounter on UAIR’s website are “Census” and “End-of-Term.”
The census snapshot is a view of what the data looks like after the census deadline or about three weeks after a semester has started. In contrast, the end-of-term snapshot is a view of the data at the end of the semester. Along with snapshots that reflect data at census and the end of the term, the SDW also has several different snapshot types including a “Live” snapshot of the data (which changes daily) and a “Start of Term” snapshot (first day of classes). It is also important to consider that UAIR’s process for capturing a snapshot will differ from the SDW’s process.
Overall, snapshots are a reflection of what the data looks like at a point in time. Much like taking a picture of a landscape, the details you see change as a function of time. Additionally, who took the photo will also affect what the data looks like. Therefore, if the snapshots of your data are not identical and are not from the same source, your data will likely look different.
Another reason why your data might look different pertains to differences in definitions. In the world of data, there are various definitions to seemingly straightforward concepts. For the sake of example, let’s explore the definition of a “major” and how a student’s major is counted. At first glance the definition of a student’s major seems pretty straightforward: it’s what they’re majoring in, duh! This is true, but it can get complicated as there are several layers to what a major entails.
First, let’s note that students can have multiple active majors. These active majors can also change at any point in the semester. In fact, a student may have one major at the census snapshot and a different major at an end-of-term snapshot. Secondly, when it comes to major definitions, there are two important terms to be aware of: primary majors and non-primary majors. Regardless of whether it is a pre-major, an intermediate major, or a full-bachelors level major, a primary major is defined as a student’s earliest declared major that is still active on their record. As students declare additional majors, they will pick up a non-primary major. For example, a student may pick a non-primary major if they become a double major. Regardless of how many additional majors a student picks up, their primary major remains the same if it is still active on their record. When a primary major is dropped, the next earliest declared major becomes the primary major.
Overall, OBIA relies on the primary major definition when allocating total major counts to colleges, departments or programs. Following the logic of primary majors, a single student will only be allocated to one college (regardless of whether they have non-primary majors with other colleges). In contrast, data from the SDW can account for both primary and non-primary majors when major counts are allocated. This means that one student, with several majors (both a primary and non-primary majors), can be allocated to the total major counts of multiple colleges/departments/programs. In short, with OBIA data processes, one student = one major count, in only one college. In contrast, with SDW data processes, one student can equate to several major counts, for several colleges. This is why even simple questions such as “how many majors does college-x have,” can result in different looking data (see Figure 1). Additionally, other important data components/concepts can have differing definitions. It is therefore crucial to be aware of how your data is defined.
|Primary Majors||Non-Primary Majors||OBIA Major Counts||SDW Major Counts|
Conclusions and Recommended Best Practices
I know what you’re thinking: great, thanks for the long-winded explanation of why my data looks different, but what am I supposed to do with this? Which data am I supposed to use? The answer depends on what your ultimate goal is. If we stick with the example of student data, both UAIR data and SDW data can be extremely useful for data-driven strategic planning, decision making, and statistical analyses. However, there are some recommended best practices associated with each source of data. When it comes to data for the University of Utah’s budget process and/or data for public consumption (media, websites, public reports, grants, etc.), we recommend the utilization of UAIR data. This is because UAIR data is already utilized for official reporting, it is publicly available, and it can be easily verified by third parties. In contrast, SDW data is only available internally to school officials at the University of Utah. Still, SDW data becomes extremely useful in day-to-day operations such as when live data snapshots are needed, specific/custom queries are needed, and personally identifiable information is needed. In conclusion, sources of data, snapshots, and definitions all contribute to why your data may look different. Ultimately, however, there are several sources of “good” data, we just have to make sure we are aware of them.
Data Availability Calendar
SDW: Updated End of Term Snapshot for Previous Fall Term available (grades, unofficial end-of-term enrollment)
SDW: Start of Term Snapshot for Active Spring Term available (unofficial enrollment counts, course registration)