Big data is the latest buzz word doing the rounds, but what does it mean and how does it relate to our experiences with the Internet? Steve Harlow looks at the ways we use big data and the future it holds.
‘Big data’ seems to be the latest technology buzz word doing the rounds, knocking the ever popular ‘cloud’ off the top spot.
Like many of these terms, most have heard of them, everyone feels they should use them, but do they all understand what they mean?
Thankfully the term ‘big data’ is pretty self explanatory.
What is big data?
Big data is simply data that is too big, too complex or changes too quickly for conventional database management tools or traditional data processing applications to handle. To decide when a data set becomes ‘big data’, you have to look at the three defining areas known as the “three v’s”; volume, velocity and variety.
- Volume. If you are collecting large amounts of data over a significant period of time, you will inevitably end up with huge database. When it becomes too big, big data tools will be required.
- Velocity. If the data you are collecting is being created rapidly, then standard database tools will not be able to cope with the speed at which the data is coming in.
- Variety. There are many sources and forms of data, both structured and unstructured. With data coming in the form of e-mails, photos, videos, monitoring devices, PDFs, audio and many more, this causes problems for storing, mining and analysing the data.
How do we process it?
To handle the ever increasing amounts of data being collected, the vendor community has responded providing highly distributed architectures and new levels of memory and processing power.
There are now many different data processing platforms available, with Apache Hadoop, first used by Yahoo and Facebook leading the way.
How can we use it?
The reason we collect data of any sort is to use it for analysis to gather required information. As this data grows in one or more of the three V’s mentioned above, this analysis becomes more complicated.
With data storage becoming more affordable, the need to be able to process massive data sets has become increasingly important to make use of all the information collated.
One example we should all be familiar with is Facebook and how they decide which ads they should put in front of different people.
Everything someone does on Facebook is stored from age, sex, location through to which pages they like or links they click on. As you can imagine, with the amount of users and frequency of this information being produced, they are able to build up massive databases allowing them to see trends.
They then use this to decide what each user should see on their page and which page and advert suggestions to give. Without big data tools and infrastructures, this information would just sit, growing at a huge rate, costing more money whilst providing no real use.
How does the future of big data look?
With the advances in storage and analysis of data and the capability to handle ever increasing amounts, the demand for big data facilitation will continue to grow.
If we take the Facebook example and think about how many other companies would want to do a similar analysis on their customers or users, it is clear that the need for big data capabilities is only going to increase.
This is going to lead to a greater need for storage, and the continual development of faster and more powerful analytical tools. We expect to see providers of data centre space or storage solutions requiring more engineers to maintain their infrastructure as they grow.
There is also a responsibility for network providers to ensure the constant development of high bandwidth and high speed connections allowing these databases containing big data to be efficiently populated and analysed.
With BT already announcing it hopes to bring in 1600 new engineers to cope with the demand for high speed connections, it is clear communications and data are industries that are continuing to thrive.