Let’s talk about grains
Not your breakfast cereal (that you might be enjoying while you read this).
Not the grains of sand lingering in your shoes, car, and hair after a vacation at the beach (happy summer to y’all)
I am talking about the grain of your data. And it might be the most important step in data modeling and reporting building.
“The grain” of a table is what a single row represents.
A row in a sales table could be:
A single transaction
Sum of sales amount for a single day
Number of items purchased by a single customer on a single day.
Total sales by month by region.
Three recommendations for grains:
1. Get clear on your grain.
Upfront it’s crucial to make a conscious and clear choice on the grain of your data.
Ask this question: What does one row of your data equal?
The total of a transaction? Is it the sum of sales for the week? Is it the average throughput of your department for the month?
2. Realize the trade-offs
Ideally, we would always have data at the lowest possible grain. But with big data sets that can demand a lot of storage.
So sometimes you need to choose a higher grain. That’s OK, but just acknowledge that with each step up in grain, you lose some ability to analyze your data. (You can’t analyze sales by day if your grain is at the monthly level). Be conscious of the trade-offs of your grain choice and own it from the start.
3. Never mix your grains.
If one line of your data is totaled at the month level and the next at the department level, doing analysis across your data will be increasingly painful. This happens more often than you would guess. Don’t do it.
I’m here,
Sawyer