Pass Summit 14 – Preconf (in progress)

Attending the preconf session Big Data: Deploy, Design, and Manage Like a Pro where Buck Woody (web), Adam Jorgensen (web | twitter) and John Welch (web | twitter) is doing their magic with Azure, HDInsight, PowerShell and everything in between.

B1iLXdiCQAA3SR6

Great questions from the attendees, and even greater answers.

Some keypoints from Buck is these, but I think they’ve always been relevant, but now even more in regards to Big Data.

  • Always ask the right questions
  • Never select the tech beforehand
  • Always select the TECHNOLOGIES after the questions have been asked and answered
  • Move 1TB data to Azure, DONT DO THAT
  • Send data i a trickle way, incremential data load

Powershell In A NutShell :

  • Scripting language
  • Based on Command Lets
    • Verb Noun
    • DIR becomes – Get child item
  • Variables always starts with a $
    • $Datasource
  • Everything is an OBJECT

John Welch is starting to talk about how to load data into your Azure storage, for this task we’re loading data from Twitter.com and Linkedin.com

John has a tool to download the XMLfeed from Twitter and Linkedin, the data needs to be preprocessed on record at at time

photo1

  •  Text files need to be in UTF-8 no BOM
  • Records is delimited by newline
  • Formats
    • Several formats can be used
    • Delimited text
    • SEQ. File
    • RCFile / Optimized Column File

 

More to come

Leave a Reply

One comment

  1. […] Pass Summit 14 – Preconf (in progress) by Kenneth M Nielsen […]