Main point: Billions of rows X millions of columns
Key Features:
Modeled after Google’s BigTable
Uses Hadoop’s HDFS as storage
Map/reduce with Hadoop
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A high performance Thrift gateway
HTTP supports XML, Protobuf, and binary
Jruby-based (JIRB) shell
Rolling restart for configuration changes and minor upgrades
Random access performance is like MySQL
A cluster consists of several different types of nodes
Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.
Examples: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.
Regards,
Alok
In case you need Certified Salesforce Consultant for any Salesforce related work, then please feel free to reach out to sales@girikon.com
Ideally Big Data is term for large and huge amount of data. Now a day’s data piles are growing exponentially. This data is coming from various sources like call logs, web logs, digital transactions, social media posts, sensors data & log, pictures, videos and everything which is digital is contributing.
Whereas Big Data doesn’t specifically indicates to any size or quantity, yet it is referred when we talk for data of petabytes and exabytes. Now Big Data is an evolving & popular term and in the current age main challenge with this plenty amount of data is how to manage it and how to get productive information from here.
There are three prime factors of Big Data:
1. Volume : Analytics on massive amount of data
2. Velocity : Faster & robust transactions with uninterrupted availability
3. Variety : Wide variety of data from different scenarios.
Where our traditional techniques are inadequate to process high volume of data , Big Data makes your business more agile, flexible and swift and to convert potential data into useful information. Dealing with larger datasets it help us to manage both structured semi-structured or un-structured data. Because traditional applications or databases takes too much time to load voluminous data and obviously costs too much, new approaches use complex algorithms for the same thing which reduces time and cost both. In such mechanism main focus is on mining for information rather than emphasizing on data schema and data quality.
Following are few references of that technologies which born to handle this buzzword “Big Data”.
Cassandra DB,
MongoDB,
HBase,
ElasticSearch,
Apache Cassandra etc
Cheers!
Pramod
In case you need Certified Salesforce Consultant for any Salesforce related work, then please feel free to reach out to sales@girikon.com
Main Part: Store huge datasets in “almost” SQL
Key Features of Cassandra DB:
Querying by key, or key range (secondary indices are also available)
Data can have expiration (set on INSERT)
Writes can be much faster than reads (when reads are disk-bound)
Map/reduce possible with Apache Hadoop
All nodes are similar, as opposed to Hadoop/HBase
Very good and reliable cross-datacenter replication
Distributed counter data type
You can write triggers in Java
Best use: When you need to store data so huge that it doesn’t fit on server, but still want a friendly familiar interface to it.
Examples: Web analytics, to count hits by hour, by browser, by IP, etc. Transaction logging. Data collection from huge sensor arrays.
Regards,
Alok
Main point: Retains some friendly properties of SQL. (Query, index)
Key Features Of Mongo DB:
Master/slave replication (auto failover with replica sets)
Sharding built-in Queries are javascript expressions
Run arbitrary javascript functions server-side
Better update-in-place than CouchDB
Uses memory mapped files for data storage
Performance over features
Journaling (with –journal) is best turned on
On 32bit systems, limited to ~2.5Gb
Text search integrated
GridFS to store big data + metadata (not actually an FS)
Has geospatial indexing
Data center aware
Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.
Examples: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.
Regards,
Alok