The question about data lakes

Most companies are still using databases as their primary data store.

But it feels like everybody in the industry is talking about data lakes.

Naturally, I get this question a lot:

“What’s the difference between a database and a data lake? And why should I want a data lake?”

Here’s one way I answer the question:

At it's best, a data lake looks a lot like a database.

  • Organized with thoughtful design.

  • Built with performant data storage (parquet, orc, Delta, hudi).

  • Easy to navigate, understandable naming structure.

  • Readily available compute to query the data.

  • Controlled data access so the right people can get what they need.

  • Governance, quality checks, and gatekeeping are in place. Otherwise, you have a data swamp.

There are plenty of differences as well, but once you embrace the similarities it’s a lot easier to start evaluating whether a data lake makes sense for your organization.

And if you want help or have questions, just hit reply.

It was good to see you today,

Sawyer

Previous
Previous

Be careful what you’re good at

Next
Next

How to hire data talent with business skills