The Data Daily
Less than 2 minutes to read each morning.
Not sure if you want in? Read the archives below.
5 days a week since May 1st, 2023.
The right conditions
About a month ago we planted grass for a new lawn. It’s been bare dirt for the first 7 months we lived in the house and we’re excited to see something green grow.
We worked with a landscaper to help plant our grass lawn. He was excellent to work with, but the process required a ton of patience.
I first met with him last fall and made a plan for planting grass.
“Can we plant this week?” I asked.
“No, it's too late in the year and a freeze is coming this weekend. We will need to wait until spring.”
So we waited through a long Michigan winter and as soon as it hit April and temperatures reached above 60, I called the landscaper.
“Can we plant this week?”
“No, we need the ground and air temps to be consistently warmer”
So we waited.
By early June it wasn't just warm it was hot.
“Can we plant now?” I asked him.
“No, we haven't had rain in 6 weeks and no precipitation in the forecast. It would be a struggle “
The last week of June he finally came out and on a Friday afternoon prepped the topsoil and planted grass seed.
He pulled me aside when he wrapped up “This evening we have rain coming for the first time in 2 months. More rain is expected throughout next week. I know you had to wait a while but this is the right time to plant.”
The rain came. And within a week we had green grass popping up everywhere across our yard.
We are incredibly happy with the results.
A few different ways this project could have turned out:
I could have planted the grass myself. In spite of all my internet research I likely would have planted too early or too late and failed.
I could have demanded my landscaper plant grass as soon as possible or threaten to find someone else.
I could have hired a different landscaper who would get me a yard in May like I wanted. That also would have failed.
Our landscaper knew that I actually wanted a lush healthy yard, and so he rebuffed my many requests to plant sooner. In the end, it was much better to trust the expert.
What’s the point?
There's a right context for a data project. The temperatures have to be right. You need rain in the forecast.
What could be poor conditions for a data project?
Leadership transitions
Budget cuts
Substantial hiring initiative
Organization and team realignment
A team that’s happy with the status quo.
Pushing ahead with a cloud migration or data platform overhaul when conditions are poor will leave you over budget and underwhelming results.
Finding a data consultant (or landscaper) who will tell you “No, now's not the right time” can save you time, money, the trust of leadership, and maybe the future of your team.
It was good to see you today,
Sawyer
Good medicine
Common ailments for a data team
No source control for SQL
Making changes directly in prod
Thinking "another index will fix the performance"
Relying on the business to tell you about data quality issues
Assuming the issue is with your tech stack
Letting the business drive the vision for the data solutions
Letting the data team drive the vision for the data solutions
Treating data modeling as a luxury
Adding another tool cuz everyone on Linkedin is talking about it
Stuck in a cycle of building ad hoc and one-off reports.
3D pie charts
Every week I talk with customers about their data problems and these are just a few of the items that come up.
These aren’t terminal symptoms. They are often more about the culture of the team and org than technology.
Hiring technically skilled staff won’t solve this. Hiring a technical data consultant won’t fix this.
Building a data culture around delivering business value is good medicine. It relieves most of these symptoms.
And if you ever need help identifying symptoms, finding the medicine, or administering the medicine
I’m here,
Sawyer
A few assumptions
Assumptions will destroy the value your data team delivers.
I’ve been on enough data teams to know we are great at being isolated and insulated. Enjoying talking and thinking about data stuff.
Which is great. Except data teams don’t exist for themselves.
When data people only talk with data people we assume we know what good data look likes.
We assume that
This column name makes sense
This visual communicates what’s important
This data source is the best source for that dataset
Refreshing the data often is what the business needs
Delivering more reports is a metric for success
Speed is better than accuracy (or accuracy better than speed)
The business team doesn’t know what they want
We know what the business team needs
Unless you collect diverse opinions. Introduce diverse eyes into your team.
These assumptions (and more) will slowly erode the value of your data team.
Talk to the business teams. Talk to outside experts. Talk to peers across industries.
You often won’t realize you are assuming things until it’s too late.
It was good to see you today,
Sawyer
It’s a moving target
Your data solution needs to change.
That likely stirs one of two emotions in you; fear or excitement.
If you feel excitement it’s likely because you have dreams (and plans!) for the future of your data tools, data design, and data products.
If it sparks fear or frustration then you are likely slogging through technical debt, rigid systems designs, and a lack of vision about what to do next or how to evolve.
After working with dozens of companies I can say with confidence
Analytics isn’t a one-time build.
Data pipelines are constantly evolving.
An ideal architecture and design is a moving target.
Dashboards require revision to meet changing business questions.
Not to mention your team headcount growing and shrinking, new lines of business launching, employees working from home, company expansion to new markets, a new CEO, macroeconomic shifts in business strategy, employees returning to the office, etc.
etc.
etc.
Build a solution that can move with the moving target. Nimble. Flexible.
Skating to where the puck might be.
So when you hear “your data solution needs to change”
It instead sparks excitement instead of fear
Because you are confident of where you are headed.
I’m here,
Sawyer
Be careful what you’re good at
Be careful what you’re good at.
I talked with a customer a few weeks ago when they confessed an all too common problem.
It went something like this:
“We started to figure out some of this data stuff. We built out some pipelines and delivered some dashboards to the business teams. It was received well. Too well. We started getting data and report requests from all across the business. Too many requests. We aren’t built to handle this.”
What a problem, huh?
The business is hungry for the value you are offering. This is a huge flashing neon sign that you are doing something right.
Congrats.
Take a minute (or two) and celebrate.
When you’re ready and reality sets in, you realize something - what got you here, won’t get you there.
Something(s) needs to change. Adapt and grow. Or discard and rebuild from scratch.
It’s ok.
You’re great at what you do remember?
You were good enough to create this problem.
You’re good enough to take the next step.
Your business is hungry for data.
It’s time to head back to the kitchen and serve up a great menu.
And if you ever need help.
I’m here,
Sawyer
The question about data lakes
Most companies are still using databases as their primary data store.
But it feels like everybody in the industry is talking about data lakes.
Naturally, I get this question a lot:
“What’s the difference between a database and a data lake? And why should I want a data lake?”
Here’s one way I answer the question:
At it's best, a data lake looks a lot like a database.
Organized with thoughtful design.
Built with performant data storage (parquet, orc, Delta, hudi).
Easy to navigate, understandable naming structure.
Readily available compute to query the data.
Controlled data access so the right people can get what they need.
Governance, quality checks, and gatekeeping are in place. Otherwise, you have a data swamp.
There are plenty of differences as well, but once you embrace the similarities it’s a lot easier to start evaluating whether a data lake makes sense for your organization.
And if you want help or have questions, just hit reply.
It was good to see you today,
Sawyer
How to hire data talent with business skills
Let’s say you need to hire for your data team
You need a new data analyst, BI analyst, or data engineer.
The default pattern is for the candidate to interview with the data team members and technical leadership.
That’s great at landing skilled technical data people on your team.
But how do you assess if the candidate can talk with business users and understand their goals?
Try this. Replace one of the technical interviews with someone from the business teams.
Maybe a marketing manager. Or supply chain team lead.
No technical questions allowed. Just a 30 or 60-minute conversation about the business.
What could you learn from that?
Can the candidate understand the core function of the business? Are they curious enough to ask probing questions? Are they capable of taking apart business problems? Can they carry on a conversation about a non-technical area of the company?
Overall, you will learn if they can communicate and connect with the business leader.
Find data people who understand business problems. It removes another layer of friction in their data experience.
I’m here,
Sawyer
The music of data
A few years ago my wife and I attended a symphony concert. We aren’t normally symphony people, but this one was special because my cousin’s husband had composed a piece one of the pieces being performed.
It was a beautiful piece.
I’m awed by the amount of coordination and vision it takes to craft a piece of music for dozens of musicians and numerous instrument types. Somehow all the pieces fit and a remarkable musical experience occurs.
It’s a helpful image to understand data workflows and movement. Let’s break it down this way.
A conductor is the ETL Orchestration software. They decide when the music starts, what order the musicians play in, and all of the various parts in line.
The music score is the code (SQL, Python, Scala, etc.) that defines what note should be played and when.
Individual musicians are compute notes. Each is assigned a body of work (code) to execute in its proper sequence.
The composer is the data architect. One (or more) person had the vision for all the pieces and assembled the right collection of instruments and a conductor to bring the work to life.
Most importantly - the audience is the business user.
All the work of the musicians, composers, and conductors is for the joy and satisfaction of the audience.
They all perform in the hopes of a standing ovation.
That’s a great vision for your data team as well.
Creating an amazing data experience for your business teams.
I’m here,
Sawyer
Double the price
We recently built a new home with a contractor.
It took 14 months instead of the estimated 6.
If we had been billed by the hour/day/month, it would have been more than double the price.
Thankfully we paid our contractor based on the value of what they delivered.
Not the length of time it took to complete it.
Would you rather buy an IT or Data Project by the hour or by the value?
I’m here,
Sawyer
p.s. Aligned incentives are foundational to trust and success in any project. If you want to talk about what that looks like with a data project, hit reply.
How to make data architecture decisions
If you spent any time in college or grad school you likely read a lot of books. Or, were supposed to read a lot of books. Most of the book titles you can’t remember, let alone any of the book content.
However, one book from grad school (about academic writing of all things) stuck with me. They Say/I Say
It makes a simple point: Don’t just share ideas, respond to others’ ideas.
The temptation of the average college or grad student when they sit down to write a paper is to just launch into opinions. “I think this here, that over there, and especially this up there”. After all, aren’t we supposed to write original ideas?
But nobody cares just about your ideas. The argument of They Say/I Say is that the only way to create compelling arguments is to position your ideas in relation to others’ ideas.
This person says xy, while I agree with x, I disagree with y, I would add this and that, and the conclusions I have from these ideas are z
They say/I Say
Data architectures and technology choices are often like a college student writing a paper.
Someone picked out a bunch of technologies based on their opinions. Maybe they read a blog once about data lakehouses. Or saw a webinar about a fancy new data tool.
It’s a collection of ideas. It might work. But it’s not in response to anything in the business.
Last week a client showed me their proposed data architecture diagram. I started asking “Why this?” to a few things on the diagram. Their response - was “I like that tool” or “I read a blog about it last week”.
Every technology decision and every design decision needs to respond to a business need.
They say/I Say
The business shares its goals and objectives. The data team responds with technology and design to meet those needs.
I’m here,
Sawyer
A library or moving truck
Your database is organized like a library or a moving truck.
The data nerds call it OLTP or OLAP.
But here’s the key idea behind those acronyms.
You can organize your data like books in a library - according to category, genre, author, and/or title. There are advanced systems and sciences for the organization of books. Dewey Decimal, Library of Congress, and Bliss Bibliographic are the most common classification systems in the US.
These systems are optimized for finding one specific book, and for quickly and properly archiving a new book that was returned. If you walk into a familiar library with a book in mind you can find it in just a couple minutes or less. The system is designed for hyper-specific selection and organization of books.
In the data world, this is an OLTP system - designed for reading and writing a single individual data point with extreme efficiency.
On the other hand, is a moving truck. Or more specifically, a library moving service. When you are faced with the challenge of moving a library, the goal isn’t individual storage and identification of books. It’s to move thousands (or millions) of books with scale. Yes, the books are still organized, but accessing a particular book when packed in boxes in a moving truck isn’t the most efficient. And it’s not what the system was designed for.
In the data world, this is an OLAP system - designed for reading and aggregating bulk information. This is often used to see trends, and summary information.
Your business needs will dictate how you store your data.
If you are Fedex and have millions of people a day looking up a tracking number for their package - you want OLTP for quick and effective retrieval of a single data point.
If you are financial analyst working on forecasting the next quarter or fiscal year, you aren’t worried about a single data point. You want to see trends and aggregates across categories. Thats OLAP.
It was good to see you today,
Sawyer
p.s. If you need help sorting through how to store your data - or how to do an appropriate mixture of OLTP and OLAP to meet your business needs, hit reply and tell me about it.
Small data
People love talking about “Big Data”.
But most data engineers or analysts will never need to think about datasets larger than a terabyte.
It’s easy to get infatuated with reading blogs about how Uber, Meta, or Linkedin builds scalable data systems and machine learning models on petabytes of data.
Design best practices for big tech companies rarely translate well to the middle and small markets. How does Stripe’s streaming pipeline with millions of events a second relate to your nightly batch data jobs?
It doesn’t.
If you are in the middle or small market, try this instead:
Talk with peers at similar-sized companies doing it well.
Go to conferences with similar-sized or companies are sharing their use cases and success.
Engage consultants and specialists who have designed numerous solutions at your size and scale.
Your company can be remarkably successful without ever needing to scale a data system like TikTok.
I’m here,
Sawyer
How many clicks?
How many clicks does it take for your business teams to get to the data they need?
Last week I had to access a database inside a customer’s secure network.
I had to log in to a desktop virtualization platform (citrix), sign in to a VM, launch a database client, log in to the server, and then issue my one simple query.
It took about 5 minutes, three multi-factor authentication alerts on my phone, three times typing in a long password, and a few uncertain moments when I looked at the long list of obscure server names and tried to remember which one I was supposed to use.
I didn’t count clicks and keystrokes but it was a lot.
All to find one piece of data.
That’s probably a bit extreme, but not as rare as most think. It makes for a terrible data experience.
Your business team will quit before getting to the data they need to do their job.
Building better data experiences means listening and watching how many clicks, links, logins, filters, or tables a user have to navigate through in order to answer their question.
Remove the friction points so the data can serve the business.
Thanks for being here,
Sawyer
p.s. Want help identifying friction for your data experiences? Hit reply or schedule a call.
Things that shouldn’t be automated
Automation and scalability isn’t ideal for every situation.
Things that should be automated and scalable:
Code deployments
Data pipelines
Business Intelligence reports
Version control for your codebase
Data quality checks
You benefit from not having to think about these tasks. It’s a problem if you have to remember to do these things or they require tons of effort. Invest in automation and scalability. Automating them improves the quality of the output.
Things that shouldn’t be automated and scalable:
Talking to business users about their use cases and data needs
Architecture design sessions for data solutions
1-1 meetings with your team and/or manager
Hiring team members
Gaining executive buy-in for data projects and initiatives
These activities benefit from your direct emotional and mental engagement. Automating them degrades the quality of the output dramatically.
I regularly talk with customers who spend so much manual time on the top list, that the bottom list suffers.
Be intentional about what you want manual and what you want automated.
I’m here,
Sawyer
p.s. This email is a manual and personal task. Very little automation here. Just me sitting down each morning to write you an email and hitting send.
Incentives
If you hired a data person for your company and you paid them by the…
...lines of code, they could be incentivized to write lots of code, regardless of efficiency or readability.
…number of meetings they attended, they would be incentivized to schedule and attend lots of meetings, regardless of the purpose.
...dashboards delivered, they could be incentivized to use bulk generic templates to crank out as many as possible. Regardless of value or usefulness.
…emails sent, they would be incentivized to find dozens of people to emails every hour, regardless of the business value.
That sounds silly, you say?
Of course. You hired them to deliver business value to your company.
If a data consulting project was priced by the...
...number of phone calls one could be incentivized to schedule lots of phone calls, regardless of efficiency or readability.
...number of stories points delivered, one could be incentivized to build out extra backlog items, and inflate point estimations.
...project status reports sent, one could be incentivized to use bulk generic templates to crank out as many as possible. Regardless of value or usefulness.
...hour then one could be incentivized to find ways of extending the project, issue change requests, maintain inefficient systems, schedule lots of meetings etc.
Sounds silly? Of course. You hired them to deliver business value to your company.
There are always incentives baked into the price.
Why not buy something to ensure incentives are aligned for everyone?
It was good to see you today,
Sawyer
p.s. What does a data project look like where we align incentives? Hit reply or schedule a call. I’d love to talk more about it.
The first decision
Define your analytics upfront.
Everyone can say they want "analytics" for their org, but analytics is a broad category that requires narrowing it down.
It could be descriptive, diagnostic, predictive, or prescriptive analytics.
Why narrow it down?
Because everything about your data platform design depends on it.
Storage (file type, columnar vs row compression, partitioning, and distribution, relational DB vs data warehouse vs data lake)
Compute (SQL engine, multi-language compute, single-threaded or parallel)
Modeling (dimensional, relational, grain of the data, feature engineering, etc)
Latency (daily/weekly/monthly batch, streaming or near real-time latency)
Defining core elements up front makes dozens of later decisions easier.
A great data team knows how to ask these questions.
A great business team knows how important defining their data goals is.
I’m here,
Sawyer
The first step
For many things in life, the first step is easy to figure out.
If you want to…
Buy a house then the first step is to find a realtor.
Get married then the first step is to go out on a date.
Get a job then the first step is to prepare your resume.
Start a business then the first step is to register an LLC.
Make dinner at home then the first step is to go grocery shopping.
With other things, the first step is less clear.
What’s the first step for reducing your company’s carbon footprint?
…raising kids who will be successful adults?
…knowing when or how to plan for retirement?
…or building a cloud data platform?
These types of questions require a detailed analysis of your situation, dreams for the future, and constraints of your context, and the answer often ends up with “it depends”.
These are the type of questions that we often talk through with experts. An environmental consultant, a parenting coach or therapist, a financial planner, or an experienced data expert.
Often, over the course of a few conversations, some ideas start to emerge. It might even look like a plan.
At the very least, it could get you to your first step toward what you want.
Just a little closer and more clear about how
you can build the data platform of your dreams*.
I’m here,
Sawyer
*or raise kids, plan for retirement, and reduce your carbon footprint if that’s your thing.
p.s. I love helping data leaders identify and craft their first step toward a cloud data platform. If that sounds even remotely interesting, hit reply or schedule a free call.
Ways to ruin a wedding
There are lots of ways to ruin a wedding.
This weekend I’m officiating a wedding for some family friends.
It’s the 4th one I’ve officiated (with a 5th one coming up in August). So far, I’ve learned a few things about weddings and being an officiant.
There are lots of ways to ruin a wedding, but an important one for the officiant to watch out for…making it about you.
Imagine if I stood up before the couple and began to share about the new suit I bought for this wedding, random facts from my marriage, how great my wedding was, and compare the size of the crowd, venue, flowers, or catering choices to those we made at our wedding. There are so many things I could talk about that are really interesting and important to me.
No one cares.
Everyone in the room is there to focus on and celebrate the bride and groom.
Stay out of the way.
Say “I” as little as possible.
Let the bride and groom have their moment.
There are lots of ways to ruin data and business team collaboration.
For a data professional, it’s making it about you.
Imagine (or more likely - remember) a meeting with the business team and the data leader waxes eloquently about their data tools, cool data architecture, the complicated data quality solution they built, and nerdy data modeling terminology. There are so many data things a data leader could talk about that are really interesting and important to them.
No one cares.
Everyone in the room is there to focus on the business goals.
Stay out of the way.
Say “I” as little as possible.
This moment is about the business team and their needs.
Don’t ruin a wedding by talking about yourself.
Don’t ruin your relationship with the business team by talking about your data stack.
It was good to see you today,
Sawyer
Better lives
How does good data make lives better?
If you are still a bit skeptical, here are a few real-life examples:
The child safety organization gains greater insight into the well-being of children entrusted to their care.
An overburdened analyst trapped in manual data entry and spreadsheet reconciliation who is freed to think creatively about strategy again.
A local credit union increases the equity and diversity of its small business lending based on demographic analytics.
An HR Department reviews the latest research and data on 4-day work weeks and shifts company policies to reduce stress and improve employee well-being
etc.
Data serves our humanity and our collective goals (i.e. business goals).
Data is collected for the sake of humanity, not humanity for the sake of data.
I’m here,
Sawyer
p.s. How have you seen data make lives better?
More humane data
Good data makes lives better. It lives up to the standard.
Alongside being easy to use, good data should make us more human.
Rather than more mechanical, more manipulated, or more overwhelmed.
That might look like:
Less data volume
Social media sending too many notifications that will overwhelm and manipulate our attention.
Adjusted data frequency
A health app or device checking and logging your body weight, heart rate, movement, etc every 30 minutes may not produce meaningful insights, but rather create tremendous noise and reduce our understanding of what's important - sustainable and long-term healthy practices.
Meaningful data privacy
My bank or credit union knows an intimate level of information about my life and where I spend money. Will they use that to enhance my human experience, sell me services, or sell my behavior to some who could exploit it? My insecurities or limited understanding of the complex financial world are ripe for tampering.
Certain things don't need to be tracked, monitored, measured, or optimized. It wrings out our humanity.
Good data enhances and increases our human experience.
And this is just a bit of what humane and good data looks like.
It could make lives better.
I’m here,
Sawyer