Mr Sattler, large amounts of data plus intelligent algorithms result in useful AI applications. What is wrong with this calculation?
Kai-Uwe Sattler: Unfortunately, large data volumes alone are not enough. Although large amounts of training data are required, especially for learning with deep nets, this naturally increases the effort of data acquisition, preparation and training. Therefore it is important to have the "right" data available as training data. So the training data - for example for image recognition - should of course contain the objects to be identified. But also negative examples in all possible or occurring variations. Bias and discrimination should be avoided when selecting data. In the literature, a whole series of examples of bias and discrimination are described, which show what effects this can have.
How to turn data into useful data? What is important in data engineering?
Kai-Uwe Sattler: First of all, suitable data representing the problem to be solved must be collected. For an application in the area of predictive maintenance, for example, error conditions should also be recorded, and not just normal operating data. Once data has been collected, it must be processed. This includes cleaning up such as the detection and removal of incorrect values, linking to other data and, if necessary, annotating the data. Both the data and the acquisition and processing procedures should be documented and described by metadata to ensure traceability. In AI projects, the effort of this preparation can be up to 80 percent of the total effort. Data engineering provides the methods and infrastructures for these processes and includes data management, data integration and data preparation - for example, through database systems, big data systems or data cleaning tools.
What skills do developers need to create trusted AI applications?
Kai-Uwe Sattler: In addition to knowledge of methods from the field of Machine Learning or Artificial Intelligence, this includes in particular knowledge of data modeling, transformation and integration, but also knowledge of statistics in order to be able to evaluate data properties and the quality of the results. Furthermore, knowledge of ethics and law is helpful in order to handle the data responsibly. And, of course, comprehensive application knowledge is also indispensable. This already shows that it is no longer just about classic software development. Rather, these are requirements that require an interdisciplinary approach: Application experts increasingly require so-called data literacy expertise, and data science specialists must also understand the application domains. There will certainly be a great need for further training courses in this area.
The whitepaper "From Data to AI - Intelligent Data Management as a basis for Data Science and the use of Learning Systems" by Plattform Lernende Systeme is available for free download here (in German).