Data pools, governance and factories are making headlines. German industry magazine “Produktion” is investigating how to succeed with data in an interview with the Data Intelligence Hub and one of its data analytics experts, Prof. Dr. Chris Schlueter Langdon.
The Telekom Data Intelligence Hub is addressing several key issues inherent in creating value from data. “The problems with data can be sliced into three levels: The technical level with the question of how to obtain the right data, the process level with questions of process quality and security and then the legal level – which includes data sovereignty.”
“It would make sense to understand data as a product and apply the usual mechanisms to it: from product development to marketing to sales.” Big industry players like BMW, Daimler und Volkswagen already know the one big secret with data: Data from sensors is raw data, it is not ready for artificial intelligence (AI) applications. Instead, raw data requires refining in order to turn it into AI-ready data or a “data product.” As any experienced data scientist can attest to: This refining can be quite extensive, time consuming, and therefore expensive. And in terms of data volume, the party has just started. New technology such as Internet of Things (IoT) sensors and the 5G cellular communications standard will trigger further data growth. If you are already struggling today, it won’t get better. Data refinement requires automation and at scale to make data products affordable, or in short, it requires data factories. This is where the Data Intelligence Hub fits in. It provides a rich assortment of data preparation and analytics tools – and all in the cloud to scale instantly.
“Contrary to popular belief, combing through large amounts of data with AI tools is not at all sufficient … None of the tools of AI and Machine Learning provides causality, only correlation.” What is required is the right data. In order to understand what the right data is, a causal analysis with domain experts ought to be conducted first to filter out the most important causal factors, which in turn, define data needs. Despite large amounts of raw data, surprisingly, the right data can be quite sparse. Many AI applications, such as predictive maintenance or autonomous driving, can require more data than what is available within a single department or company. “Again, this is where the Data Intelligence Hub can be helpful as it offers even smaller companies several options: pooling, sharing and aggregating data … A horizontal pool is created, for example, by a machine manufacturer evaluating the operating data of his products across the customers.”
So far, few companies have been willing to engage in these types of data pools and sharing. What has been missing are exchange options with data governance mechanisms, that strike a balance between the need to protect one’s data and share it with others. Again, this is where the Data Intelligence Hub can help. It has created a first connector solution. Instead of creating a proprietary solution or reinventing the wheel, the Data Intelligence Hub has implemented an open solution: “Our customers benefit from data governance based on blue prints from Fraunhofer Institutes.” The DIH Connector is based on the reference architecture model (RAM) of the International Data Spaces Association (IDSA 2019). IDSA is an association of industry participants, created to promote data governance architecture solutions based on research conducted by German Fraunhofer Institute with funding from the German government (Fraunhofer 2015). Members include automakers like Volkswagen, suppliers like Bosch, and traditional information technology specialists like IBM.
One key application domain that requires extensive data sharing and pooling is the field of new, future or smart mobility. Particularly, in urban environments, the traditional, auto-based approach seems to have reached a breaking point in terms of traffic jams, accidents and air pollution. Solutions include intermodal transportation, dynamic traffic management and even autonomous shuttles – all of which require data analytics-based optimization, which in turn requires data from many different sources and owners (we had written earlier about this in: “Space Race“).