A Data Science workplace

image

In my post about what makes a Data Scientist I talked about the role of a Data Scientist and the kind of work that these people do. So next I want to talk about the Data Science laboratory and how a typical workplace could look like.

I started to create a Virtual Machine where I can play and experiment with different tools. So how does my laboratory looks like?

  • Virtual Machine with Windows Server 2012 as my OS
  • SQL Server 2012 for relational data sets
  • Hadoop cluster based on HDInsight for unstructured data set (logs, twitter, text, sensor data, etc.). It can be downloaded as HDInsight Server or used as HDInsight Azure Service in the cloud.
  • Excel 2013 with Power Query
  • R environment and R Studio for advanced analytics

Is this platform able to handle large data sets and complex analysis? Of course, SQL Server is a very high scalable database, the Hadoop cluster in Windows Azure can be extended up to 32 nodes, Excel ships with a column store in-memory engine called Power Pivot, that can handle million of records in a highly compressed format and R can be scaled by solutions providers like Revolution Analytics.

So now we have a good laboratory for more advanced analytics. Let’s see what we can do with it in my next post.

3 thoughts on “A Data Science workplace

  1. Utpal says:

    nice read. I am keen to know how these tool speak to each other to deliver the ultimate power of big data analytics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s