21 May 2012

The Tech Reflector

 
Star Following

  Source Favicon
On myNoSQL 8 months ago.
Email

Most of the time vendor videos are emphasizing the superiority of their own commercial platform. But this short video gives a fair overview of the similarities and differences between Hadoop and Netezza.

The video is 5 minutes long and well worth watching.

Krishnan Parasuraman ( IBM Netezza Chief Architect) also mentiones a couple of scenarios where using both solutions would deliver an optimal solution:

Hadoop used as a data ingestion layer for large volumes of data

Hadoop is a system of archive

nosql.mypopescu.com Read
  Source Favicon
On myNoSQL 8 months ago.
Email

In the series of big announcements coming out this month , Cloudera and Revolution Analytics, the enterprise provider of R software, have announced their partnership to integrate Cloudera's Hadoop distribution with Revolution R Enterprise platform thus offering R developers direct access to Hadoop data stores and the possibility to write MapReduce jobs directly in R.

The integration packages, named RevoConnectR for Apache Hadoop , are already available freely …

nosql.mypopescu.com Read
  Source Favicon
By engbl0g of Foursquare Engineering Blog 8 months ago.
Email

The check-in data was provided by our data team, pulled from the Hadoop/ Hive infrastructure they've built for analysis purposes. You can learn more about that setup here . From Hive we get a CSV of all the check-in lat/longs and whatever other fields we queried. Even for just one week of check-ins, this is a 2.9GB worth of data.

The visualization itself was mostly created with Processing . I'm a huge fan of Processing because of the flexibility it allows. …

engineering.foursquare.com Read
  Source Favicon
On Scout ~ The Blog 8 months ago.
Email

With over 50 million plays, OMGPOP - the free multiplayer game site - is logging a lot of data. Tracking stats like app downloads and launches paint a picture of how their games are performing.

This logging data is collected via Flume , a system for collecting streaming data, and delivered to a Hadoop Distributed File System ( HDFS). So, how do you keep your Flume nodes configured in a consistent manner?

Enter Apache ZooKeeper , "a centralized …

scoutapp.com Read
  Source Favicon
On Scout ~ The Blog 8 months ago.
Email

With over 50 million plays, OMGPOP - the free multiplayer game site - is logging a lot of data. Tracking stats like app downloads and launches paint a picture of how their games are performing.

This logging data is collected via Flume , a system for collecting streaming data, and delivered to a Hadoop Distributed File System ( HDFS). So, how do you keep your Flume nodes configured in a consistent manner?

Enter Apache ZooKeeper , "a centralized …

scoutapp.com Read
  Source Favicon
By Prof. Dr. Stefan Edlich of NoSQL Databases 10 months ago.
Email

HPCC another older Hadoop competitor

HPCC: from LexisNexis, info, article

http://www.lexisnexis.com

http://www.lexisnexis.com/government/solutions/literature/hpcc-das.pdf

http://wikibon.org/blog/lexisnexis-hpcc-takes-on-hadoop-as-battle-for-big-data-supremacy-heats-up/

Eingestellt von Prof. Dr. Stefan Edlich um 08:45

nosql-databases.blogspot.com Read
  Source Favicon
By admin of WhyNosql 10 months ago.
Email

Key HBase community members advise people not to host their HBase cluster on EC2. And they have good reasons for advising so. But in this post I am going to explain why we decided to host our HBase cluster on EC2 and why we continue to host it on EC2.

When we began experimenting with HBase in July of 2009, HBase was fairly new and we were experimenting with Hadoop and HBase to learn how these technologies could help us solve our problems. By then we didn't have …

whynosql.com Read
  Source Favicon
By admin of WhyNosql 11 months ago.
Email

Hadoop Summit is always interesting for Hadoopers. You get to learn the latest and greatest in Hadoop world and meet the people behind projects in the Hadoop ecosystem. In this post, I have tried to share my takeaways.

Currently there are many distributions of Hadoop floating around. Besides the main Apache Hadoop distribution, there is Cloudera, Yahoo, IBM and even Amazon uses there own distribution for their Elastic Map Reduce Service. All these distributions were born …

whynosql.com Read
  Source Favicon
By Ambrish Choudhary of Large Data Matters 11 months ago.
Email

A super-computer architecture that crunches big data for banks, police, and spooks will soon be open sourced as a super-fast alternative to the Googlesque Hadoop.

LexisNexis Risk Solutions is opening up its High Performance Computing Cluster ( HPCC), a system written in C++ that it claims is four-times faster than Hadoop when running data-intensive queries on ordinary Linux servers.

Source

india.paxcel.net Read
  Source Favicon
On The Basho Blog 12 months ago.
Email

…/ Erlang in the academic community, where most distributed systems research builds on Hadoop/ Java. He is also of the belief that there is considerable research still to be done in the area of eventually consistent distributed systems, and that Basho has a role to play in producing novel research.

Joe currently resides in Boulder while he finishes up his PhD, and takes turns working from home and the university campus. This fall he plans to move back to Seattle, where he previously …

blog.basho.com Read