According to IBM’s website, we create 2.5 quintillion (1018) bytes of data everyday. So much that 90% of the data in the world today has been created in the past few years alone. This data comes from everywhere: from sensors used to gather climate information, posts to social media sites, digital pictures and videos posted online, transaction records of online purchases, and from cell phone GPS signals to name a few. This data is big data. Data have swept into every industry and business function and are now an important factor of production, alongside labor and capital.
Most common sources use Gartner’s definition (the 3Vs) to describe Big data. They are Variety, Velocity and Volume.
- Variety – Big data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more.
- Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximise its value to the business.
- Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.
Big data is more than a challenge; it is an opportunity to find insight in new and emerging types of data, to make your business more agile, and to answer questions that, in the past, were beyond reach. Currently there are a few platforms to harvest and analysis these data. These platforms are developed by Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell. IBM’s product InfoSphere and Microsoft’s Windows Azure HDInsight are the popular ones in the market. They both use Hadoop based analysis and support data warehousing, which bring the topic of my next blog – Why Hadoop? What’s data ware housing?.