Home
The 1st Workshop on AI for Data Editing aims to bring together researchers, practitioners, and policymakers to explore innovative AI-driven solutions for the multifaceted challenges in data editing. As data grows exponentially, there is an urgent demand for advanced strategies in data preprocessing, cleaning, transformation, quality control, as well as better understanding of complex interdependencies in large-scale data workflows. This workshop will delve into the latest AI technologies to facilitate efficient, accurate, and human-centric data editing processes.
The workshop is held in conjunction with the 31th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2025), one of the world's premier conferences in data science, data mining, and big data analytics. Organized by ACM SIGKDD, KDD is a key platform where researchers, practitioners, and industry experts present groundbreaking advancements and explore emerging trends in data mining, machine learning, and AI. This partnership provides the workshop with a vital forum for interdisciplinary knowledge exchange, fostering collaboration among data scientists, domain experts, and policymakers.
Submission Guidelines
We invite the submission of regular research papers, which cannot exceed 9 pages, including an appendix, plus unlimited references (paper content is limited to 9 pages, that means that if you have an appendix, then it should be included within that page limit. It is also ok if you do not have an appendix and instead 9 pages of content). Submissions must be in PDF format, and formatted according to the new Standard ACM sigconf template. Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. All the papers are required to be submitted via the easychair system.
We invite authors to submit their papers via the EasyChair submission portal: https://easychair.org/conferences?conf=ai4de
Topics of Interest
We encourage submissions on a broad range of topics related to AI for data editing, including but not limited to:
- Methods for automated data science
- Automated data cleaning, denoising, interpolation, refinery and quality improvement
- Automated feature selection, generation, and feature-instance joint selection
- Automated data representation learning or reconstruction
- Automated outlier detection and removal
- New datasets in domain application areas
- in speech, vision, manufacturing, smart cities, transportationmobile computing, sensing, medical, recommendation, personalization, science domain
- Tools and methodologies for accelerating open-source dataset
preparation and iteration
- Tools that quantify and accelerate time to source and prepare high-quality data
- Tools that ensure that the data is labeled consistently, such as label consensus
- Tools that make improving data quality more systematic
- Tools that automate the creation of high-quality supervised learning training data from low-quality resources, such as forced alignment in speech recognition
- Tools that produce consistent and low noise data samples,or remove labeling noise or inconsistencies from existing data
- Tools for controlling what goes into the dataset and for making high-level edits efficiently to very large datasets, e.g. adding new words, languages, or accents to speech datasets with thousands of hours
- Search methods for finding suitably licensed datasets based on public resources
- Tools for creating training datasets for small data problems, or for rare classes in the long tail of big data problems
- Tools for timely incorporation of feedback from production systems into datasets
- Tools for understanding dataset coverage of important classes, and editing them to cover newly identified important cases
- Dataset importers that allow easy combination and composition of existing datasets
- Dataset exporters that make the data consumable for models and interface with model training and inference systems such as webdataset.
- System architectures and interfaces that enable composition of dataset tools such as MLCube, Docker, Airflow
- Algorithms for working with limited labeled data and improving label efficiency:
- Data selection techniques such as active learning and coreset selection for identifying the most valuable examples to label.
- Semi-supervised learning, few-shot learning, and weak supervision methods for maximizing the power of limited labeled data.
- Transfer learning and self-supervised learning approaches for developing powerful representations that can be used for many downstream tasks with limited labeled data
- Algorithms for working with shifted, drifted, out-of-distribution
data
- New datasets for bias evaluation and analysis
- New algorithms for fixing shifted, drifted, OOD data
- Algorithms for working with biased data
- New datasets for bias evaluation and analysis
- New algorithms for automated elimination of bias in data
- New algorithms for model training with biased data
Important Dates
Workshop Call for Papers | April 19th, 2025 |
Workshop Paper Submission | May 8th, 2025 |
Notification of Workshop Papers Acceptance | June 8th, 2025 |
Workshop Date | August 6th, 2025 |
Organizing Committee
Yanjie Fu
Arizona State University
yanjie.fu@asu.edu
Kunpeng Liu
Portland State University
kunpeng@pdx.edu
Dongjie Wang
University of Kansas
wangdongjie@ku.edu
Xiangliang Zhang
University of Notre Dame
xzhang33@nd.edu
Khalid Osman
Stanford University
osmank@stanford.edu
Charu Aggarwal
IBM T.J. Watson Research Center
charu@us.ibm.com
Suzanne M. Shontz
University of Kansas
shontz@ku.edu
Huan Liu
Arizona State University
huanliu@asu.edu
Jian Pei
Duke University
j.pei@duke.edu
Volunteers
Rui Liu
University of Kansas
Ph.D Student
Tao Zhe
University of Kansas
Ph.D Student