Jan 29, 2015

Big Data, Big Promises with Predictive Analytics

By Petr Lazecký

Big Data has attracted a lot of attention recently. Even though Big Data processing concepts were introduced almost a decade ago, it remains an area of very active research and development. Updates and announcements of new products and solutions occur on an almost weekly basis. Depending on your side of the fence, you may be skeptical, doubtful or even suspicious about new wave of hype technologies that recycle old concepts under new marketing umbrellas. On the flip-side, you may recognize a new wave of commodity components, software based "Lego pieces" that enable interoperable system designs with all well-known benefits.

Today, whenever there is discussion about Big Data, conversation usually circles only around data mining, data pattern recognition, trend analysis, predictive analysis and all the traditional statistics-based data processing "stuff". However, if you take a modern approach and start to think about Big Data as a highly distributed, scalable and redundant ETL engine that is controlled by a unified programming language, you may start to see innovative value that Big Data really brings to the table.

If providers could instead collect the data, analyze it, act on it and monetize it, the potential returns would be immense.

Getting Started with Big Data

As in many industries, Communication Service Providers (CSPs) are now intensely focused on the opportunities offered by Big Data and advanced analytics to grow revenues, intelligently engage with customers, and enhance operational efficiency.

This effort is especially critical for the large, established carriers, which for years have been fighting the headwinds of slowing subscriber/revenue growth and grappling with back office technology systems that weren't designed for the Internet age. It's been difficult to invest in and bring new products to market quickly and very hard to evolve them once they're live.

What these carriers do have is data. Mountains of it. Most of which passes into and then out of their systems like rain. If providers could instead collect the data, analyze it, act on it and monetize it, the potential returns would be immense.

Until very recently, such large piles of data weren't of much use. Traditional data processing applications couldn't unpack them. Even today, CSPs face architecture and technical challenges that prevent them from using near-real-time customer data to evolve operations or provide attractive, relevant customer offers.

In many cases, these are the same issues that slow product development and lead to operational leakages elsewhere in the business.

In a traditional, relational database, information is stored in multiple tables that are connected by "joins." The information included is very structured. Every record should appear in just one table, without duplications across different items.

This structure is logically precise and works exceptionally well for many uses. The problem is that it can't be scaled past a certain amount of data before systems become too degraded for real-world use.

The alternative, "in-memory" databases used in Big Data efforts are built in very different ways. Instead of information being parceled out to many different, connected tables, all data is kept together in a single "document" (which is more like an .xml file). This dramatically increases system performance, especially because memory is much faster than the disk-based storage it replaces.

The limitation of this kind of system is that it only works with pre-built queries. The business organization will easily be able to sort orders by geographic area – if that's been built in – but pulling instead by salesperson would be a different set of relationships. It's a trade-off. In order to have high performance, one must intentionally limit reporting capabilities.

Big Data technology continues to evolve. The platforms are stable, but from a features perspective they are constantly moving forward.

Big Data Technical Hurdles for CSPs

The first issue is building an infrastructure that can receive "fire hose" data streams. The silo-ing issues that affect many CSP systems come into play at this point. Overlay and linking applications are typically required to work with and connect existing systems.

This part of the process does not differ greatly from traditional bulk data processing. Once the data are gathered, they can then be put into the special formats and star schemas that enable the creation of analytics reporting.

Big Data Technology

Perhaps surprisingly to those who think proprietary software is always superior, the best-known and most-used packages in this area are all open source. Analytical solutions comprising open source components provide performance and functionality on par with – or better than – other, more expensive productions.

One reason open source has pulled ahead is that Big Data technology continues to evolve. The platforms are stable, but from a features perspective they are constantly moving forward.

MapReduce began as a proprietary technology developed by Google, but eventually became open source. The best-known implementation is probably Apache Hadoop, which is also what Excelacom develops in.

MapReduce/Hadoop is a programming model that generates design patterns to restructure data from one format to another one. It's the transformation logic that "maps" value pairs and then "reduces" or summarizes this information into a structure that can be processed in parallel.

We then use Online Analytical Processing (OLAP) to build new searches in memory. This disaggregation and indexing allows data slicing and new forms of analytical queries that allow for much deeper dives into the available data and more advanced scenario analysis.

This is how one would, for example, plug into the massive amount of Twitter data being continually produced to look for strings and patterns of keywords to understand how conversations begin, flow and ebb.

Elastic search is the OLAP data scheme also known as data cubes. These are loaded into memory, indexes and schemas are defined, and an engine is built to provide close to real time analytical data processing.

In-memory databases rely on highly distributed caches that are spread across multiple systems, using a concept known as data "sharding." Because the volumes of data are so huge, they must be partitioned separately on clusters – "or wolf packs" – of machines that are each sharing memory. These machines work together to preserve an overall view in their memory.

CSPs that continue to use static customer segmentation to identify the products and services are missing out.

The Promise of Big Data

Traditional databases are built to find information; Big Data exists to find patterns. What looks like an undifferentiated mass of data can be analyzed and viewed through statistical analysis to understand trends and macro-scale processes.

This is not just a machine-based process, however. Traditional reporting is static. Analytical reports are dynamic. Instead of a pie chart that simply lists orders by state, it becomes a clickable interface that allows for drilldowns into each slice of the pie. Decision makers can quickly differentiate between unimportant or important items in data.

CSPs that continue to use static customer segmentation to identify the products and services are missing out. By taking advantage of Big Data principles and technology, using customer-centric data such as location and time, CSPs can create dynamic pricing for specific timeframes when the customer might be outside specific time ranges and locations.

The Century Predictive Performance module coupled with the Century Product Catalog module provides CSPs with an adaptive product catalog that looks for patterns in real-time, based on individual subscriber parameters such as usage, location, probability to churn, time of the day and other critical factors.

For more detailed information about the ways Excelacom can help Communication Providers address – and conquer – the challenge of Big Data unleash the opportunities involved, email us at

comments powered by Disqus


Petr Lazecký is Principal Consultant and System Architect for EMEA at Excelacom. He is involved in strategy and transformation projects in this area, as well as working on System Architecture of Excelacom’s telecom products.

More about Petr

Innovation meets performance.