3 key learnings from data best practice

Despite big investments in data scientists, expensive IT infrastructure and tools, AI seems to be stuck in first gear. What’s often missing is the right management approach – and the right recipe to orchestrate all the new ingredients …

The Telekom Data Intelligence Hub is addressing several key issues inherent in creating value from data. “The problems with data can be sliced into three levels: The technical level with the question of how to obtain the right data, the process level with questions of process quality and security and then the legal level – which includes data sovereignty. It would make sense to understand data as a product and apply the usual mechanisms to it: from product development to marketing to sales” states Crosby & Langdon (2019). A brief background on “Data as a Product” by Crosby & Langdon can be found in Marketing News of the American Marketing Association (link). Learnings can be summarized into 3 rules of thumb: Focus on the right data, get it ready for AI, and make it affordable.

Key Learning 1: Pick the right data – Don´t be tricked by correlation, start with causation

Neither Big Data nor AI will get data science results, the critical element is causality. No patient will swallow medicine without a causal diagnosis. “Contrary to popular belief, combing through large amounts of data with AI tools is not at all sufficient … None of the tools of AI and Machine Learning provides causality, only correlation.” What is required is the right data. In order to understand what the right data is, a causal analysis with domain experts ought to be conducted first to filter out the most important causal factors, which in turn, define data needs. Despite large amounts of raw data, surprisingly, the right data can be quite sparse. Many AI applications, such as predictive maintenance or autonomous driving, can require more data than what is available within a single department or company. “Again, this is where the Data Intelligence Hub can be helpful as it offers even smaller companies several options: pooling, sharing, and aggregating data … A horizontal pool is created, for example, by a machine manufacturer evaluating the operating data of his products across the customers.”

Key Learning 2: Refine raw data into AI-ready data

Big industry players like BMW, Daimler, and Volkswagen already know the one big secret with data: Data from sensors is raw data, it is not ready for Artificial Intelligence (AI) applications. Instead, raw data requires refining in order to turn it into AI-ready data or a “data product”. As any experienced data scientist can attest to: This refining can be quite extensive, time-consuming, and therefore expensive. And in terms of data volume, the party has just started. New technology such as the Internet of Things (IoT) sensors and the 5G cellular communications standard will trigger further data growth. If you are already struggling today, it won’t get better. Data refinement requires automation and at scale to make data products affordable, or in short, it requires data factories, a term that is further described in our article “Data factories for data products,” (link). This is where the Data Intelligence Hub fits in. It provides a rich assortment of data preparation and analytics tools – and all in the cloud to scale instantly.

Key Learning 3: Benefit from open data standards to make it economical

So far, few companies have been willing to engage in these types of data pools and sharing. What has been missing are exchange options with data governance mechanisms, that strike a balance between the need to protect one’s data and share it with others. Again, this is where the Data Intelligence Hub can help. It has created a first connector solution. Instead of creating a proprietary solution or reinventing the wheel, the Data Intelligence Hub has implemented an open solution: “Our customers benefit from data governance based on blueprints from Fraunhofer Institutes.” The DIH Connector is based on the reference architecture model (RAM, link) of the International Dataspaces Association (IDSA 2019, link). IDSA is an association of industry participants, created to promote data governance architecture solutions based on research conducted by the German Fraunhofer Institute with funding from the German government (Fraunhofer, 2015). Members include automakers like Volkswagen, suppliers like Bosch, and traditional information technology specialists like IBM.

The urgency of data exchange in mobility

One key application domain that requires extensive data sharing and pooling in the field of new, future, or smart mobility. Particularly, in urban environments, the traditional, auto-based approach seems to have reached a breaking point in terms of traffic jams, accidents, and air pollution. Solutions include intermodal transportation, dynamic traffic management, and even autonomous shuttles – all of which require data analytics-based optimization, which in turn requires data from many different sources and owners (we had written earlier about this in “Space Race,” link).

This article is based on a longer piece in the trade magazine “Technik und Wirtschaft für die deutsche Industrie: Die Produktion”: link