Data quality
The data quality service provides automated assessment of measurement quality and data integrity
In-situ marine data is prone to error due to several factors. Underwater sensors may produce a lot of anomalies when powered by low-batteries. Biofouling is an accumulation of microorganisms, plants, algae, or small animals on underwater sensors, resulting in unexpected measurement results. If low-quality data is used in decision-making processes, the results will be misleading or suboptimal; thus, data quality control must be done to ensure data reliability.
Most data quality control activities predominantly relies on manual checks. Thanks to today’s advanced sensory technologies, vast amounts of data are collected on a daily basis. Manual DQC can be time-consuming and lead to significant delays in data publication. To address this challenge and speed up the DQC process, we are developing an automated DQC framework called Adaptive Anomaly Detector (AdapAD) based on Artificial Intelligence (AI), specifically unsupervised anomaly detection.
The SmartOcean project has developed a framework that aims to detect anomalous measurements from real-time data in a scalable manner, and which support data validators and data consumers to assess the quality of data for further usage.
- AutoQC framework implementation: https://github.com/smartoceanplatform/AutoQC_framework
The framework is based upon the work published in:
- N-T. Nguyen, K. Lima, A.M Skalvik, R. Heldal, E. Knauss, T.D Oyetoyan, P. Pelliccione C. Sætre: Synthesized data quality requirements and roadmap for improving reusability of in-situ marine data. In Proc. of 31th International Requirements Engineering Conference, 2023.