Unknown Member – which key should they have?

Today I read a rather interesting blog post from Jamie Thomson (Twitter|Blog), the post was about surrogate keys and why they should be used.

The blog post is this Considering surrogate keys for unknown members

He made a point regarding that most developers tend to use -1 as the surrogate key for the unknown member, but why is that? Well I think, as well as Jamie that the reason for this is that the default identity seed on Microsoft SQL Server is 1, even though the INT goes from -2147483648 to 2147483648 and 2147483648 is an pretty high number which will fit in 90% of the datawarehouse solutions we work on.  

 

But as Jamie says using -1 as an identity for our unknown member, tends to give the surrogate key a value, and that is the exact opposite meaning to a surrogate key.

 

The surrogate keys should be meaningless, and not bear any hidden meaning or value in itself. It is just a dummy key.

 

Then I was thinking a bit further, what if we created a table containing all the unknown values for each an every Dimension table in the system. So I started SSMS up and gave it a try, the result was a table with the following design.

 CREATE TABLE [dbo].[Unknown](

       [UnknownID] [int] IDENTITY(-2147483648,1) NOT NULL,

       [DimensionName] [nvarchar](50) NULL,

       [UnknownType] [nvarchar](20) NULL,

 CONSTRAINT [PK_Unknown] PRIMARY KEY CLUSTERED

(

       [UnknownID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

 

And in it I put the following values.

 Insert into Unknown(DimensionName, UnknownType) Values (‘Customer’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Item’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Order’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Zipcode’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Country’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘City’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Branch’, ‘Unknown’)

Insert into Unknown(DimensionName, UnknownType) Values (‘Date’, ‘Unknown’)

  

Then when we have to populate our fact table, we could join this Unkown table to the dataset and then get the Unknown Member ID specifically for the Dimension at hand.

This way we don’t, have some surrogate key that is missing referential integrity, but it introduces new issues. Issues that would make it a little more difficult to manage or develop a Cube.

I’ve put in an unknown type just in case, I have seen several datawarehouses where there was a need for both -2, -1 and 0 as unknown or misused members. (agreed it may be poorly designed, and they are a constant candidate for refactoring)

This was my 2 cents on the topic, I would like to hear from you, and maybe start an discussion about it. Jamie and I started on twitter, but 140 char is not quite enough.

Do you commute?

For the past decade or so, I’ve been commuting to and from work in my little spaceshuttle (car!). Every day was the same, drop the kids off at school and then hit the highway, often together with like a million others whom had the same way of commuting. Every single one of us sat in our own little capsule and used diesel or gasoline, all the while there was plenty of room in the car for at least 3 fellow commuters. 

And when in the car, all you can do work wise is having a phonecall, which can be ok, but not that productive…

So as of April 1st this year, I decided to try and take the trains to and from work, and I must say it have been a relief. OK, the travel time is slightly longer, but I can be 100% productive while seated in the train (actually writing this Blogpost in route between Køge and København).

What is the basic needs for a fellow commuter who wish to work? 

Well here’s my list.

  • A lightweight laptop, I’ve got a Lenovo T430s.
  • 3G modem either as a USB stick or built in.
  • Headphones for listening to radio or talking over lync.
  • The ability to focus and not be distracted by fellow commuters.
  • Be diciplined about your work, have a list of things you have to finish on the commute.
  • Order an extra battery for the laptop, danish trains does not have power outlets.
  • Grab a cup of steaming hot coffee, and its even easier to drink it on the train than in the car ;o)

 

Well that was my list, I’m sure you can come up with even more, So feel free to comment, we might gain knowledge together.

Exam prep for SQL server 2012 MCSA

Microsoft have just launched their youtube channel with exam prep videos for Microsoft SQL Server 2012 MCSA.

There are videos for the following exams:

  • 70-461 – Quering Microsoft SQL server 2012
  • 70-462 – Administering Microsoft SQL server 2012
  • 70-463 – Implementing a Data Warehouse with Microsoft SQL server 2012

They are all about 1 hour and 15 minutes long, so grab a coffee mug and a notepad og Onenote, and prepare yourself for your next exam.

http://www.youtube.com/playlist?list=PLahhVEj9XNTct1VslZ_LhNSI8F5_8Ikve

 

SSIS Tip of the day

I’ve just learned a new tip about SSIS that I was not aware of. A tip that I think you all should know about,

When retrieving data via the OLE_DB connection, there are several options to how the data should be retreived. Often we just select “Table or View” option, mostly because we’re to lazy to write up the SQL that is needed for the operation.

 

Just be aware that when you do that, the SSIS engine will use the command OPENROWSET to get the data, but when you’re typing up the correct SQL to get the data and using the “SQL command” the SSIS engine will use the faster and more optimized SP_Executesql.

 

Filtering on databases in Management Studio

Ever since I started working with Microsoft SQL server, there have been more than only one database on each of my servers. It can ba a nightmare to manage a installation with more than a few databases, and it is much like when we are working inside a specific database, consisting of hundreds of tables.

It can be a rather large task to pinpoint the excact table, view, function, stored procedure or what have you, but since a few versions ago, we have had the possibility to filter on these objects.

050213_0643_Filteringon1

Like in the picture above, I have made a filter so only tables belonging to a schema containing the word “Human” appears on the list.

Why is it that this feature is not implemented as an option to filter on databases, in some installations, where there are maybe more than 100 databases it would be a real treat to have this option.

Since the code is already implemented as a filter option on other objects, I would have to say that it would be rather easy to implement.

As you can see I am not the only one suggesting this to the SQL server team :

https://connect.microsoft.com/SQLServer/feedback/details/718007/filter-the-databases-list-in-object-explorer-oe-list-in-management-studio

https://connect.microsoft.com/SQLServer/feedback/details/488488/hide-or-filter-databases-in-ssms

https://connect.microsoft.com/SQLServer/feedback/details/497849/ability-to-filter-database-list

https://connect.microsoft.com/SQLServer/feedback/details/270659/filter-databases-in-object-explorer-management-studio

For some reason they have decided to close this suggestion, and will not implement it in a version, they just say they would consider it.

Do not be a total show off

During an interview with a client, do not behave, as you are the center of the universe, or the smartest man in the room. Often it pays of to be a little bit humble about your skills and really listen to what the client actually tells you.

Latest I have tried this myself, when my colleague and I was at an interview at a large Danish company. We had one goal and that was to land the contract, and one of the many things we did, was to let the client do most of the talking, while we listened and absorbed every little aspect of their story. We were up against some rather well known competitors, and my belief is that we managed to deliver to the client that we actually could do the walk, and not only the talk.

To sum up, I would recommend the following.

  • Listen, listen and listen some more.
  • Take notes (for you own sake, and assures the client that you are serious).
  • Do not go on an endless rant about all the technologies you know about.
  • Talk smooth and slow about the various projects you have worked on.
  • Toss in a funny anecdote, which shows that you are not flawless.
  • Always show the client that you have understood their problems, and can act upon them.

All this actually resulted in contract, and that we landed the client.

Business Intelligence – What was the first word?

If we have to look at a Business Intelligence (BI) project from the sideline, one of the most common pits a project may fall into, is NOT to include the Business or the end users.

Some rules that we could work by in a BI project could be these (the list is not complete, and is only a guideline)

  • Let the Business Users define their needs.
  • Constant challenge them, and ask them questions about their perception of their world (data).
  • Define a Business Model, Information model and a Process model.
  • Explorer data in the area of interest.
  • Don’t try to model their entire world (data), regardless that they “need everything”
  • Always, and I cannot emphasize that enough, start with the easy achievable goals.
  • Instead of working by the Waterfall method, use the much more agile method called Scrum

We know it all beforehand vs. let’s get wiser as we go!

In many projects that I have participated as a developer, some tended to scoped them as a normal application development project. Where there have been a massive analysis and requirement work done beforehand, everything looks dandy and the business might have accepted the scope of the BI project.

But what often happens, is that the world is evolving faster than anyone realizes, and the consequences is that what looked as the best possible solution ½-1 a year ago often tends to be somewhat of. Another risk is that the requirement and specifications of the system is hard to change, or if it is possible – the price for doing so might be high and set the project back.

The above method goes by the name Waterfall.

Some companies that have reached a certain maturity concerning their software development – both their applications and their BI systems, have embraced the agile way of software development.

One of the key advantages of this paradigm is that the project is scoped in much smaller pieces. Instead of the massive specification, that often was the result of a project run by the Waterfall method, the project owner and sponsor agrees on a final product based on a few pages of describing text and or some mockups.

When the project is starting, the project is scoped in smaller pieces that the given team of developers, business consultants, testes and IT can manage to finish in af timeframe of 10 to 15 working days. This way it is cheaper to make changes to the specifications as we go, as every sprint as it is called is in self a mini project.

If you would like to read more about this agile method called SCRUM – you should visit this page