Get started using Data Science tools fast on AZURE

logo-131029674714477966Did you know that Microsoft have made it really easy to get started using various data science tools? How, you ask! Well Microsoft have compiled a few Virtual Machine Images, ready to be spun up/provisioned on Azure – all you have to do is to select the data science flavor of your choice.

Microsoft Data Science Virtual Machine

The Microsoft Data Science Virtual Machine (DSVM) is pre-configured to enable you start right away doing data analysis and Machine Learning Modelling.

DSVM is available on Windows 2012 Server or OpenLogic 7.2 CentOS-based Linux Operating systems.

The main tools include:

  • Microsoft R Server Developer Edition
  • Anaconda Python distribution
  • Jupyter notebooks for Python and R
  • Visual Studio Community Edition with Python and R Tools
  • Power BI desktop
  • SQL Server Express edition

It also includes ML tools like:

  • CNTK (an Open Source Deep Learning toolkit from Microsoft Research)
  • xgboost
  • Vowpal Wabbit

Prerequisites:

To get started, you must either have a MSDN subscription with Azure account, and Paid Azure account or the free evaluation Azure account.

Let asume that none of this is set-up, and I will walk you through the setup process of a free evaluation account (which will give you ~1300dkk worth of credit)

Free Azure Account:

Follow these steps to get a free account

  1. Open a browser and navigate to https://azure.microsoft.com/en-gb/free/
  2. Click the startnow
  3. Sign in with you live-ID or hotmail account
  4. Fill in the informations
  5. Add a creditcard, only used for validation, no money is withdrawn from the card
  6. once done, click submit and you’re ready to explorer Azure.

Images:

Windows Version

  • Microsoft R Server (Enterprise R, R Open, MKL)
  • Anaconda Python 2.7, 3.5
  • Jupyter Notebooks (R, Python)
  • SQL Server 2014 Express
  • Visual Studio Community Edition 2015
    • Azure SDKs, HDInsight Tools, Data Lake Tools
    • Python and R Tools or Visual Studio (IDE)
  • Power BI Desktop
  • ML Tools
    • Integration to Azure Machine Learning
    • CNTK (Deep Learning)
    • Xgboost (Popular tool in data science competitions)
    • Vowpal Wabbit (Fast Online Learner)
    • Rattle (Visual quick start data analytics tool)
  • APIs to access Azure and Cortana Intelligence Suite services
  • Tools for data transfer to and from accessing Azure and Big Data storage technologies (Azure Storage Explorer, Powershell)
  • Git
  • Linux/Unix utilities through Git-Bash and Windows Command Prompt
Linux Version

  • Microsoft R Open (Open Source R + MKL)
  • Anaconda Python 2.7, 3.5
  • Jupyter Notebooks (R, Python)
  • Postgres, Squirrel SQL (Database tool), SQL Server Drivers and Command Line (bcp, sqlcmd)
  • Eclipse with Azure toolkit plugin
  • Emacs (with ESS, auctex)
  • ML Tools
  • Integrations to Azure Machine Learning
  • CNTK (Deep Learning)
  • Xgboost (Popular tools in data science competitions)
  • Vowpal Wabbit (Fast Online Learner)
  • Rattle (Visual quick start data analytics tool)
  • APIs to access Azure and Cortana Intelligence Suite services
  • Azure Command Line for administration
  • Azure Storage Explorer
  • Git

How-To Guide:

Windows Platform

Once you have decided on which flavor you would like to work on follow these steps.

  1.  Go to this page https://azure.microsoft.com/en-gb/marketplace/partners/microsoft-ads/standard-data-science-vm/
  2. Click the create
  3. Login on VM from Remote Desktop using the credentials you entered during the creation of the VM
  4. Click on the Start Menu to discover and launch many of the tools.
  5. Please have a look at the product documentation and the How-To guide on info about how to use the specific tools available on the VM and what tasks to be in your data science project.

Linux Platform:

  1.  Go to this page https://azure.microsoft.com/en-gb/marketplace/partners/microsoft-ads/linux-data-science-vm/
  2. Click the create
  3. Login to the VM from a SSH client like Putty or SSH command using the credentials you specified while creating the VM.
  4. On the shell prompt, type “dsvm-more-info.”
  5. For graphical desktop, you need to download X2Go client for your client platform from here and follow the instructions in the Linux DSVM documentation.
  6. Please have a look at the product documentation on info about how to use the specific tools available on the VM and what tasks to be in your data science project.

Leave a Reply