What is a Data Model?
Long time reader (since day 1!), Kyle Gibson, wrote in and asked some great questions about data modeling.
Here’s an excerpt and my response (lightly edited for clarity. Shared with permission):
“I’ve always associated “modeling data” with building Fact and Dimension tables to bring into Power BI for a star schema. So my main usage of model is just how I plan to model it in Power BI.
But I see the usage of “model” for modeling data for Data Scientists and AI usage, which is more for training their AI algorithms.
What is the best way to think of what people mean in building a data “model”?”
I love this question because it cuts through the noise and points out how sloppy our vocabulary is sometimes as data professionals.
So what exactly is a data model?
At a most basic level, to model data is to shape it. To put it into some kind of form. Anytime you "save" data to a file or table you have modeled it.
It may not be modeled well or in a usable way.
Most of the time when we talk about Data modeling, we mean putting the data into a shape that is usable for a business or application purpose.
Modeling data for analytical reporting (descriptive and diagnostic analysis) is often best served by a star schema (like in Power BI).
Modeling data for Machine Learning model might mean feature engineering and/or flattening the data into one big table.
Modeling data for a web application database probably means highly normalized tables like an OLTP.
There are no right way to model data. It's a generic term. We are always modeling the data with a specific use case in - AI, BI or an application, etc.
Which requires data teams
talking
with
business teams.
I’m here,
Sawyer