Citi casts its eye over Big Data issues

Go back

A report from Citi’s global perspectives and solutions group into big data, ePrivacy and data protection begins with some startling facts about the exponential rise in the sheer volume of data being produced by today’s online world.

The report cites IBM which has stated that 90 percent of the data existing in the world today has bene generated within the last two years and every two days the world generates as much data as was generated in the entire history of mankind up until 2003.

It is the world’s social engines which are largely the cause of this information explosion; every minute there are over four million Facebook posts made and over 300 hours of new video posted; on YouTube over six billion hours of footage are posted every month.

This adds up to a huge amount of potential value; an average retailer maximising its data could improve margins by more than 60 percent, according to McKinsey; an MIT study of 180 large companies suggests that data-driven decision-making improves output by between 5-6 percent; lastly, the UK’s Centre for Business and Economic Research (CEBR) has estimated that the economic benefit of big data between the years 2015-20 in the UK could total £241bn.

As the report states, technology and big data have enabled whole new industries to emerge, including programmatic advertising which in the US has gone from revenues of $3.5bn in 2011 to an estimated $19.3bn in 2018 or 85 percent of total digital display advertising.

And the world’s consumers are already seeing the benefit; as the online gambling sector knows very well, every day activities can be wholly conducted without leaving our homes and this is only the start. Says the report: “In the future, artificial intelligence and machine learning – both heavily reliant on data for their insights – will increasingly lead to tailoring of services, pricing and products.”

So what’s the catch? Well, it’s to do with the extent to which the consumer is happy – or not – with the level of personal data that can now be gathered about them and their activities and subsequently used to track our activities online.

After all, that programmatic advertising works by knowing where you have previously browsed and what you have previously purchased. But as a diagrammatic in the report makes clear, the level of data available about any individual extends far beyond that; from mobile there is geo-locational data and data from wearables; from online, alongside the browsing history, there is email data, tweet data, social postings and app use; then there is transactional data such as purchases and bookings; there is technical IP and device data; and then there is data from your home courtesy of the internet of things. Then offline, there is again store card history; facial recognition data; your personal data online such as births, marriages, land registry data; and finally there is bank account information, bills, phone contact information and questionnaire data.

Broadly this tracked, in terms of online, by consent cookies – whether that is session-only, storage-based or cache-based – and by network fingerprinting and other tracking mechanisms.

As the report says, broadly consumers are accepting of this practice of data collection as it is part of an understood value exchange. “This said, most consumers may not fully appreciate how much of the data they generate is being collected and how this ends up being used,” the report adds. “Many would, we suspect, probably be surprised that their ability to get credit may be impacted by who they are friends with on social media, for example.”

The technological challenge
As interesting as the why of big data collection is the how. What lays behind the exponential rise of big data is the commoditisation of computing and data storage; the price points for both have fallen steeply – “by an order of magnitude” according to Citi – in recent years while the standardisation of smartphone has driven a rise of sensor and internet of things applications.

But the technological pressure from these advances is telling. “Traditional data management architectures have been based on and evolved from relational database technology invented in the 1970s and that went mainstream in the 1990s,” the report states.

However, the explosion in the growth of data has been of a different magnitude and crucially it comes in unstructured or semi-structured forms. Says the report: “Also, the order of magnitude of data has increased in not only volume, but also velocity (rate of change) and variety (inherent in the data being semi or unstructured). The relational database is not well-suited to this new data landscape, as relational database instances generally only ‘scale up’ and have fixed schemas for classifying/categorizing data.”

Unsurprisingly, the companies that are best-placed to innovate have been the huge internet and social media giants. Yet the landscape is still fragmentary and developing – including such systems as Hadoop, NoSQL, MapReduce and Spark processing technology – and their immaturity remains an “inhibitor to broadly exploring the data opportunity.”

“We don’t expect there is a short-cut to the solution here, although we note building populations of ‘data scientists’, software engineers, systems administrators, and other technical personnel that are proficient in big data technologies will be a requisite competency for most data-centric companies.”

These data recruits will be set the challenge in more established businesses of attempting to gain valuable insight from data tied up in multiple legacy systems. The task for many will be installing data integration tools that can unlock data currently being generated or that was generated in the past. In words that have a high degree of resonance for the online gambling industry, the Citi report points out that there are very tangible examples of this, such as bringing together all the information around an interaction with a given customer.

“This seemingly simple problem of having all information about interactions with a single customer has not only sparked significant data integration efforts, but has also lead to an investment cycle in many industries, whereby companies are replacing their customer-facing applications to deliver this single view of customer.”