Unknown Member – which key should they have?

Today I read a rather interesting blog post from Jamie Thomson (Twitter|Blog), the post was about surrogate keys and why they should be used.

The blog post is this Considering surrogate keys for unknown members

He made a point regarding that most developers tend to use -1 as the surrogate key for the unknown member, but why is that? Well I think, as well as Jamie that the reason for this is that the default identity seed on Microsoft SQL Server is 1, even though the INT goes from -2147483648 to 2147483648 and 2147483648 is an pretty high number which will fit in 90% of the datawarehouse solutions we work on.  

 

But as Jamie says using -1 as an identity for our unknown member, tends to give the surrogate key a value, and that is the exact opposite meaning to a surrogate key.

 

The surrogate keys should be meaningless, and not bear any hidden meaning or value in itself. It is just a dummy key.

 

Then I was thinking a bit further, what if we created a table containing all the unknown values for each an every Dimension table in the system. So I started SSMS up and gave it a try, the result was a table with the following design.

 CREATE TABLE [dbo].[Unknown](

       [UnknownID] [int] IDENTITY(-2147483648,1) NOT NULL,

       [DimensionName] [nvarchar](50) NULL,

       [UnknownType] [nvarchar](20) NULL,

 CONSTRAINT [PK_Unknown] PRIMARY KEY CLUSTERED

(

       [UnknownID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

 

And in it I put the following values.

 Insert into Unknown(DimensionName, UnknownType) Values (‘Customer’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Item’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Order’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Zipcode’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Country’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘City’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Branch’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Date’, ‘Unknown’)

  

Then when we have to populate our fact table, we could join this Unkown table to the dataset and then get the Unknown Member ID specifically for the Dimension at hand.

This way we don’t, have some surrogate key that is missing referential integrity, but it introduces new issues. Issues that would make it a little more difficult to manage or develop a Cube.

I’ve put in an unknown type just in case, I have seen several datawarehouses where there was a need for both -2, -1 and 0 as unknown or misused members. (agreed it may be poorly designed, and they are a constant candidate for refactoring)

This was my 2 cents on the topic, I would like to hear from you, and maybe start an discussion about it. Jamie and I started on twitter, but 140 char is not quite enough.