Data quality as a success factor for AI - the 5 most important criteria and measures derived from them
High quality data is the basis for successful AI projects. The basic prerequisite is that these must also be available in sufficient quantity to enable correct modeling. How large the data volume has to be is to be determined in the specific project in the preliminary phase of the so-called "data understanding". If the amount of data is not sufficient, e.g. because the time period over which the data is available is too short or the data was summarized early, one can try to tap other data sources. If this is not possible, it makes sense to carry out the AI project at a later point in time when sufficient data is available.
In many cases, sufficient data is available in companies and organizations. In order to achieve good results from AI projects, appropriate data quality is required.
There are five main criteria concerning data quality:
Correctness in relation to reality.
Logical consistency of the data among each other
Reliability in relation to the source
Completeness in number and content
Uniqueness
A subsequent improvement of the data quality is usually very complex and expensive, often impossible, e.g. if the data are no longer available in the appropriate form. Simple corrections, such as the replacement of individual missing data with standard values or the elimination of definitely incorrect data records, are carried out as standard in the preliminary phase of model creation.
A high data quality can be achieved with different measures. If data is entered manually (e.g. via an ERP or CRM system), it is important to provide the appropriate technical functionality during data entry, such as plausibility checks, specification of value ranges, etc., as well as a simple and intuitive user interface that supports correct data entry.
In the case of automatically recorded data, the appropriate, robust technical infrastructure must be provided in order to achieve continuous transmission of the data (e.g. streaming in the case of measured values).
On the way to becoming a data-driven company, it is advisable to implement the concept of data governance. This is a multitude of organizational structural measures (definition of roles (data owner, data stewards) and responsibilities, guidelines, standards and processes in relation to data) that enable good organizational data management.
You want to know the potential of your company's data and how we can help you to leverage it? Learn more here: Enterprise Data Analytics | Datasense. Or you can arrange a non-binding meeting with us regarding workshops "Impulse Enterprise Data Analytics" under Contact | Datasense.
Comments