Attending the preconf session Big Data: Deploy, Design, and Manage Like a Pro where Buck Woody (web), Adam Jorgensen (web | twitter) and John Welch (web | twitter) is doing their magic with Azure, HDInsight, PowerShell and everything in between.
Great questions from the attendees, and even greater answers.
Some keypoints from Buck is these, but I think they’ve always been relevant, but now even more in regards to Big Data.
- Always ask the right questions
- Never select the tech beforehand
- Always select the TECHNOLOGIES after the questions have been asked and answered
- Move 1TB data to Azure, DONT DO THAT
- Send data i a trickle way, incremential data load
Powershell In A NutShell :
- Scripting language
- Based on Command Lets
- Verb Noun
- DIR becomes – Get child item
- Variables always starts with a $
- Everything is an OBJECT
John Welch is starting to talk about how to load data into your Azure storage, for this task we’re loading data from Twitter.com and Linkedin.com
John has a tool to download the XMLfeed from Twitter and Linkedin, the data needs to be preprocessed on record at at time
- Text files need to be in UTF-8 no BOM
- Records is delimited by newline
- Several formats can be used
- Delimited text
- SEQ. File
- RCFile / Optimized Column File
More to come