Today was the day where Regis Baccaro ( T | B ) and I had the pleasure to speak at Campus Days in Copenhagen. Started the day by making final changes to the slidedecks, going through the slides together and making a plan for the session. Then we headed to the Speakers Lounge, a rather fine room in one of the smaller cinemas in the movie theatre , where the event took place – had some coffee and a light breakfast.
Session started at 10.45 in room 10, a room that could accommodate around 200 people (unfortunately, we didn’t manage to attract that many) – but still a good number.
I started talking about which and End-to-End big data solution on AZURE, what is needed to success with such a project and how do we set all the elements up. A short walkthrough of each of the elements, and then a tale about how they are created by using the AZURE portal.
- AZURE Account
- Storage Account
- SQL Server
- SQL Databases
- Firewall rules
- HDInsight Cluster
- Hive Scripts
- Machine Learning
Then I moved on to talk about how tiresome work this the creation and deletion of these elements really is, and isn’t there a smarter way to achieve this? It turns out there is another way to do the same tasks, and I choose to use PowerShell for the job. I used a script based on a script from Adam Jorgensen ( T | B ) and John Welch ( T | B ) that I have extended a bit, so that we also take care of uploading data to AZURE and creating tables to be used in HIVE Queries.
The demo went well, if we don’t take into account the smaller errors, that sometimes occur, this time because I by mistake had my PowerShell ISE running in the wrong directory, but I managed.
I think that the demo and the message about automation of the entire creation of BigData project on AZURE was well received, but I also got the impression that many people yet have to see the use for a HDInsight solution, as some of the questions asked was –
- Why would i put my data into HDInsight instead of ordinary SQL Server?
- The SQL server could return an answer faster than my HIVE query, so why?
- Are there any real world implementations – we’ve only seen demos
After my demos and talk, Regis talked about another way of automate the process, and his take on was to have it all done in the ordinary ETL process using SSIS. Microsoft have not yet made any SSIS components that allows you to work with AZURE from SSIS, if they will in the future would be nice, but I don’t know.
But there is someone else that thought it would be cool to mange your AZURE setup in SSIS, and it’s oh22data a German company, and they have made suite of components for SSIS to mange AZURE, Regis made a demo where all the task that I did in PowerShell where done i SSIS as part of your regular ETL flow.