-
At Cloudkick we track a ton of metrics about our customer's servers and it's quite a challenge to store such massive amounts of data. Early on, we made the decision to avoid using tools like RRDTool, so we could provide a more holistic look at infrastructure.
-
I'm writing this up because there's always quite a bit of discussion on both the Cassandra and Hector mailing lists about indexes and the best ways to use them.
-
When building a Cassandra cluster, the “key” question (sorry, that’s weak) is whether to use the RandomPartitioner (RP), or the OrderPreservingPartitioner (OPP). These control how your data is distributed over your nodes. Once you have chosen your partitioner, you cannot change without wiping your data, so think carefully!
For Cassandra newbies, like me and my team of HBasers wanting to try a quick port of our project (more on why in another post) nailing the exact issues is quite daunting. So here is a quick summary.
-
Based on Ronald Mathies’ intro articles to Cassandra and a few other resources I’ve been gathering, I thought I should put together a detailed guide to getting started with Cassandra.
-
This article discuss about running mapreduce jobs using the apache tools called pig and hive.Before we can process the data we need to upload the files to be processed to HDFS/S3. We recommend uploading to hdfs and keeping the important files in s3 for backup is a better practice. s3 is easily accessible from commandline using tools like s3cmd. HDFS is a failover cluster filesystem which provides enough protection to your data over instance failures.
-
Cube is an open-source system for visualizing time series data, built on MongoDB, Node and D3. If you send Cube timestamped events (with optional structured data), you can easily build realtime visualizations of aggregate metrics for internal dashboards.
-
Here at Flickr, we’re pretty nerdy. We like to measure stuff. We love measuring stuff. The more stuff we can measure, the better our understanding of how different parts of the website work with each other gets. There are two types of measurement we especially like to do – counting and timing. These exciting activities help us to know what is happening when things break – if a page is taking a long time to load, where is that time being spent and what task have we started to do more of.
About me
Geek of all trades, just another Perl hacker, tech enthusiast, Mac user, Vim lover, open source sustainer, Usenet lurker. AKA larsenOn twitter
My book

About