Good keys, what are they like?
A central value add of data warehouses is their ability to restore the sanity that comes from using good keys. Taking a model-agnostic view of keys, they refer to “something” that is uniquely...
View ArticleAn Overview of Source Key Pathologies
I previously made the point that source systems cannot be trusted to generate good keys. In this post, I will explore the pollution you may encounter as you dip your feet into the crude oil leaked...
View ArticleTransforming Source Keys to Real Keys – Part 1: Introducing Map tables
I have ranted enough about the key quality, or lack thereof, in source systems. It is time to look at how we will practically go about transforming the dirty source to the final DW model. What I shall...
View ArticleTransforming Source Keys to Real Keys – Part 2: Using Maps To Fix Key Problems
In part 1 of this post, I introduced the idea of map tables. These tables serve as an abstraction between the source systems and the entities in the data warehouse. In this post, I will describe how...
View ArticlePhysically Placing the Maps in the architecture
Before we leave the maps behind, I need to live up to my promise of describing the storage characteristics of tables visited during the journey through the warehouse architecture. This must include the...
View ArticleBoosting INSERT Speed by Generating Scalable Keys
Throughout history, similar ideas tend to surface at about the same time. Last week, at SQLBits 9, I did some “on stage” tuning of the Paul Randal INSERT challenge. It turns out that at almost the same...
View ArticleWhy Surrogate Keys are not Good Keys
History tracking in warehouses is a controversial discipline. I this post, I will begin to unravel some of the apparent complexities by taking apart the history tracking problem, piece by piece....
View ArticleExploring Hash Functions in SQL Server
Hash distributing rows is a wonderful trick that I often apply. It forms one of the foundations for most scale-out architectures. It is therefore natural to ask which hash functions are most efficient,...
View ArticleWhy You Need to Stop Worrying about UPDATE Statements
There seems to be a myth perpetuated out there in the database community that UPDATE statements are somehow “bad” and should be avoided in data warehouses. Let us have a look at the facts for a moment...
View ArticleClustered Index vs. Heap
At Stack Overflow the other day, I once again found myself trying to debunk a lot of the “revealed wisdom” in the SQL Server community. You can find the post here: Indexing a PK GUID in SQL Server 2012...
View Article