From Theory to Practice: A Data Quality Framework for Classification Tasks

e-Archivo Repository

Durante los días 23 y 24 de abril, e-Archivo no estará operativo por cambios en la infraestructura del repositorio. Disculpen las molestias.

Show simple item record

dc.contributor.author Corrales Múñoz, David Camilo
dc.contributor.author Ledezma Espino, Agapito Ismael
dc.contributor.author Corrales, Juan Carlos
dc.date.accessioned 2019-02-11T10:53:49Z
dc.date.available 2019-02-11T10:53:49Z
dc.date.issued 2018-07-01
dc.identifier.bibliographicCitation Corrales, D.C., Ledezma, A., Corrales, J.C. (2018). From Theory to Practice: A Data Quality Framework for Classification Tasks. Symmetry, 10 (7), 248.
dc.identifier.issn 2073-8994
dc.identifier.uri http://hdl.handle.net/10016/28040
dc.description.abstract The data preprocessing is an essential step in knowledge discovery projects. The experts affirm that preprocessing tasks take between 50% to 70% of the total time of the knowledge discovery process. In this sense, several authors consider the data cleaning as one of the most cumbersome and critical tasks. Failure to provide high data quality in the preprocessing stage will significantly reduce the accuracy of any data analytic project. In this paper, we propose a framework to address the data quality issues in classification tasks DQF4CT. Our approach is composed of: (i) a conceptual framework to provide the user guidance on how to deal with data problems in classification tasks; and (ii) an ontology that represents the knowledge in data cleaning and suggests the proper data cleaning approaches. We presented two case studies through real datasets: physical activity monitoring (PAM) and occupancy detection of an office room (OD). With the aim of evaluating our proposal, the cleaned datasets by DQF4CT were used to train the same algorithms used in classification tasks by the authors of PAM and OD. Additionally, we evaluated DQF4CT through datasets of the Repository of Machine Learning Databases of the University of California, Irvine (UCI). In addition, 84% of the results achieved by the models of the datasets cleaned by DQF4CT are better than the models of the datasets authors.
dc.description.sponsorship This work has also been supported by: Project: “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Convocatoria 03-2018 Publicación de artículos en revistas de alto impacto. Project: “Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT - ID 4633” financed by Convocatoria 04C–2018 “Banco de Proyectos Conjuntos UEES-Sostenibilidad” of Project “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R).
dc.format.extent 31
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher MDPI
dc.rights © 2018 by the authors; licensee MDPI, Basel, Switzerland.
dc.rights Atribución-NoComercial-SinDerivadas 3.0 España
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject.other DQF4CT
dc.subject.other Data quality issue
dc.subject.other Classification task
dc.subject.other Conceptual framework
dc.subject.other Data cleaning ontology
dc.title From Theory to Practice: A Data Quality Framework for Classification Tasks
dc.type article
dc.subject.eciencia Informática
dc.identifier.doi https://doi.org/10.3390/sym10070248
dc.rights.accessRights openAccess
dc.relation.projectID Gobierno de España. TRA2015-63708-R
dc.relation.projectID Gobierno de España. TRA2016-78886-C3-1-R
dc.type.version publishedVersion
dc.identifier.publicationissue 7
dc.identifier.publicationtitle Symmetry-Basel
dc.identifier.publicationvolume 10
dc.identifier.uxxi AR/0000022415
dc.contributor.funder Ministerio de Economía, Industria y Competitividad (España)
 Find Full text

Files in this item

*Click on file's image for preview. (Embargoed files's preview is not supported)


The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record