Prof. Dr. Christoph Schlueter Langdon of the Telekom Data Intelligence Hub explains the three steps for success with AI, highlighting the importance of causality rather than correlation in an expert interview in AutomotiveIT. “‘ Without a hypothesis of the relation between cause-and-effect, fishing expeditions have little use. […] Statistics only provide correlations, not causality. An example: Health and economic performance are positively correlated, but where should you invest the next Euro: in health or economic growth?’ explains Schlueter Langdon.”
Step 1: The right start – Focus by questioning. “At the start it is important to condense a problem into a question, which you want to answer with the data analysis.” For this purpose, the data scientists need to gain insight into workflows before the start of the analysis in order to understand which step of a process should be optimized with AI. Only then can the desired result be defined and the appropriate models developed. In most companies, this step fails because of silo thinking among departments and missing information for the data scientists.
Step 2: Causal model and hypotheses. “Then it’s about further narrowing the focus by forming hypotheses grounded in theory, a so-called causal model. ‘If this causal model cannot fit on a napkin, then you should not continue at all,” suggests Schlueter Langdon.” Only by defining the events relevant to the process step to be optimized can the correct data be extracted from the quantity of data and the problems with data management defied (for more insights, see “Data is broken,” link).
Step 3: The right data to prevent GIGO. “Only then can the right data be identified, refined and finally analyzed. Another core principle with AI: All information required to answer the question must be included in the data, otherwise there is the risk of GIGO (Garbage In, Garbage Out, for more info see, “Creating data pools for AI,” link). “No raw iron without iron ore in the rock: Same with data – one has to ensure that it contains the information required to solve a problem,” the data science expert explains.
“‘Especially with Neural Networks, the quality of results depends almost entirely on the quality of the training data,’ clarifies the Data Science expert. For example, in so-called Convolutional Neural Networks (CNNs), the labelling quality directly determines the accuracy of image recognition results. ‘The description of the training data has to be very granular for each object’, the expert notes.”
This article is based on a longer piece in AutomotiveIT (2019-05): link