Sean McElroy's Blog: Hadoop

Saturday, April 7, 2012

Putting 1 Trillion in Context

In reading a recent article about Amazon's S3 Cloud service I recalled a conversation with Mickey McManus up at Maya. We were discussing Big Data last year and the ideas we had about working with ever increases volumes of data.

The Read Write Web article's title I just read was Amazon S3 Showing Signs of Slowing as It Approaches 1 Trillion Objects. This got me thinking about the conversation with Mickey. With the US deficit always being discussed in the Trillions these days it is easy not to understand the magnitude of something measured in the trillions. Here is just something to consider with discussing Trillions of anything:

If you take something that we all deal with, the unit measure of Time, and put that into context I think you will get the picture. So starting with the basics, 60 seconds = 1 minute; we all know that and it isn't that much time. So let's scale that up. What is the amount of time in:

1 Million Seconds = 11.57 days
1 Billion Seconds = 31.7 years
1 Trillion Seconds = 41709 years

41K years, that is a long time ago and allows us have a better understanding what a trillion really means when we start thinking about what the Amazon S3 service is approaching.

The challenge here is the rate at which we are approaching the Trillion object mark. It is happening very rapidly and almost occurring over the last 2 decades of data, information, and services that are exploding through the Internet. The question we should start asking ourselves now is how do we handle way beyond the Trillions of anything on the Internet. When a trillion objects are common place.

Today we are all using technologies such as Hadoop, or variants of Hadoop and building massive data centers that deal with complex challenges in cooling and power management and efficiencies. These are examples of our basic building blocks today. Evolution in this area will only take us so far. We as an information society must starting thinking about transformational ideas and completely different technologies if we hope to harness the not so different horizon of 2030.

Monday, November 14, 2011

NetApp and Hadoop

I just came from a meeting over at NetApp. I've been a fan of their products in the past and have deployed out numerous systems using them all over the world. The key advantage in the past has always been their replication technology that ships with each system.

I ran across an article from GigaOM yesterday entitled "NetApp does network-attached Hadoop" http://goo.gl/HhjYe. Notably in the article is a simple rational why NetApp is partnering with Coudera on this:

"The ultimate goal of the new NetApp product, Albanese said, is threefold: 1) to separate the compute and storage layers of Hadoop so each can scale independently; 2) to fit with next-generation data center models around efficiency and space savings; and 3) to improve reliability by being able to hot-swap failed drives and otherwise leverage NetApp’s storage expertise."

I would also add a forth reason, NetApp is tied to the Federal Government sales chain and the Feds are in Lobe with Hadoop at the moment. This product will give the Feds a simplified way to purchase a Hadoop stack that is easy to understand.

We are pretty excited about this product so we will be taking a look at this when we get one and report back what we find.

Sean McElroy's Blog

Saturday, April 7, 2012

Putting 1 Trillion in Context

Monday, November 14, 2011

NetApp and Hadoop

Share This Site

Read My Paper

Twitter

Labels

Blog Archive

Google Analytics

Share This

Sean McElroy's Blog

Saturday, April 7, 2012

Putting 1 Trillion in Context

Monday, November 14, 2011

NetApp and Hadoop

Share This Site

Subscribe To

Read My Paper

Twitter

Labels

Blog Archive

Google Analytics

Share This