1

I have went through a lot of articles but I dont seem to get a perfectly clear answer on what exactly a BIG DATA is. In one page I saw "any data which is bigger for your usage, is big data i.e. 100 MB is considered big data for your mailbox but not your hard disc". Whereas another article said "big data to be usually more than 1 TB with different volume / variety / velocity and couldn't be stored in a single system". Also that data should be stored in a NOSQL db with Hadoop used to transform data.

Further, I have been working on a solution and was wondering if I could classify it as a big data. Snippets on the solution below,

  • Millions of raw data records and usually 500 plus GB of data.
  • SQL database as back-end and SSIS / SQL queries to cleanse/process the data and convert it to a meaningful form.
  • Visualization using Spotfire

Any help would be much appreciated. Thank you !

Community
  • 1
  • 1
Chendur
  • 1,099
  • 1
  • 11
  • 23
  • Lets hear a couple of opinions and then vote for a closure, shall we? – Chendur Feb 22 '16 at 18:15
  • 2
    From [help/dont-ask]: "_Your questions should be reasonably scoped. If you can imagine an entire book that answers your question, you’re asking too much._" – user2314737 Feb 22 '16 at 20:44

3 Answers3

11

Big data is huge and complex data , which is challenging to capture, store, process, retrieve and analyze it.

Four main characteristics:

  1. Volume : “big” word in big data talks for the sheer volume. It could amount to hundreds of terabytes or even petabytes of information.

  2. Velocity: Velocity means the rate at which data is growing.

  3. Variety : Big Data could be in any form such as structured, unstructured, text, images, log files etc.

  4. Veracity: Veracity refers to quality and accuracy of data.

Ravindra babu
  • 37,698
  • 11
  • 250
  • 211
  • Thanks for your response. So big data isnt relative? Like the case I mentioned above? 100 mb is big data for mailbox and not hard disc? – Chendur Feb 22 '16 at 18:37
  • 100 mb is not big data. I have updated my answer with more details now. – Ravindra babu Feb 22 '16 at 18:40
  • Thanks. And would my solution be classified as big data? I suppose it satisfies all the four V's but its stored in MS SQL and processed using SSIS. Does where we store and how we process matter? – Chendur Feb 22 '16 at 18:47
  • 1
    It's not big data as variety has not been satisfied with MS-SQL server. It can store structured data only. Big data can store peta bytes of data including structured, un-structured and semi-structured including audio, video and images. MS-SQL does not fit here. – Ravindra babu Feb 22 '16 at 18:49
  • You can have 4000 data node cluster and you can do real time processing on huge sets of data – Ravindra babu Feb 22 '16 at 18:50
  • Have a look at this question to under stand nosql (to store big data) vs rdbms : http://stackoverflow.com/questions/4160732/nosql-vs-relational-database – Ravindra babu Feb 22 '16 at 18:56
  • Thank you. I will have a look. – Chendur Feb 22 '16 at 19:03
9

Big data is:

When a big boss believes this is a big opportunity because data is the new oil and gold, and get a big pile of money to throw out a window and flush it down the bowels. And then your data warehouses and silos turn into a data lake and the data lake full of synergy into a data swamp full of bit rot; where the big vision hits the reality that not everything that shines is gold. And then the gates of doom open and there it comes, the big bubble that is about to burst. The bridge over the through of desillusionment is small, and thou shall not pass, but tumble into the big abyss where all useless data go, no latter how eagerly it was collected and mapped and reduced without plan or objective. Bingo!

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
4

The Big Data Definitions & Taxonomies Subgroup of the NIST Big Data Public Working Group released a volume on definitions NIST Big Data Interoperability Framework: Volume 1, Definitions

Quotes:

Big Data refers to the inability of traditional data architectures to efficiently handle the new datasets. Characteristics of Big Data that force new architectures are:

  • Volume (i.e., the size of the dataset);
  • Variety (i.e., data from multiple repositories, domains, or types);
  • Velocity (i.e., rate of flow); and
  • Variability (i.e., the change in other characteristics).

These characteristics—volume, variety, velocity, and variability—are known colloquially as the ‘Vs’ of Big Data

and:

Big Data consists of extensive datasets—primarily in the characteristics of volume, variety, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis.

user2314737
  • 27,088
  • 20
  • 102
  • 114
  • To the V's of Big Data, you need to add it something more which is `Value`. Value is the V more important in Big Data and without it, none of the others are useful. – Kenry Sanchez Sep 30 '19 at 00:30