AI4DE: The 1st International Workshop on AI for Data Editing

KDD 2025's Workshop
8:00 AM - 12:00 PM | Sunday, August 3, 2025 | Toronto, ON, Canada

Home

The 1st Workshop on AI for Data Editing aims to bring together researchers, practitioners, and policymakers to explore innovative AI-driven solutions for the multifaceted challenges in data editing. As data grows exponentially, there is an urgent demand for advanced strategies in data preprocessing, cleaning, transformation, quality control, as well as better understanding of complex interdependencies in large-scale data workflows. This workshop will delve into the latest AI technologies to facilitate efficient, accurate, and human-centric data editing processes.

The workshop is held in conjunction with the 31th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2025), one of the world's premier conferences in data science, data mining, and big data analytics. Organized by ACM SIGKDD, KDD is a key platform where researchers, practitioners, and industry experts present groundbreaking advancements and explore emerging trends in data mining, machine learning, and AI. This partnership provides the workshop with a vital forum for interdisciplinary knowledge exchange, fostering collaboration among data scientists, domain experts, and policymakers.

Submission Guidelines

We invite the submission of regular research papers, which cannot exceed 9 pages, including an appendix, plus unlimited references (paper content is limited to 9 pages, that means that if you have an appendix, then it should be included within that page limit. It is also ok if you do not have an appendix and instead 9 pages of content). Submissions must be in PDF format, and formatted according to the new Standard ACM sigconf template. Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. All the papers are required to be submitted via the easychair system.

We invite authors to submit their papers via the EasyChair submission portal: https://easychair.org/conferences?conf=ai4de

Agenda

The workshop will follow a half-day format, focusing on paper presentations, poster sessions, keynote and invited talks. We anticipate attracting a minimum of 75 and potentially up to 100 attendees. People who have been accepted to give paper presentations, posters are invited to attend. In addition, the workshop is open to researchers, industry professionals, and policymakers interested in AI applications for urban planning. Due to venue capacity, attendance may be limited.

August 3, 2025 (Location: Room TBD)
Time	Event	Speaker
08:50-09:00	Open Remarks	Organizers
TBD	Panel Discussion	Xiong Hui(HKUST)
TBD	Closing Remarks	Organizers

Topics of Interest

We encourage submissions on a broad range of topics related to AI for data editing, including but not limited to:

Methods for automated data science
- Automated data cleaning, denoising, interpolation, refinery and quality improvement
- Automated feature selection, generation, and feature-instance joint selection
- Automated data representation learning or reconstruction
- Automated outlier detection and removal

New datasets in domain application areas
- in speech, vision, manufacturing, smart cities, transportationmobile computing, sensing, medical, recommendation, personalization, science domain

Tools and methodologies for accelerating open-source dataset preparation and iteration
- Tools that quantify and accelerate time to source and prepare high-quality data
- Tools that ensure that the data is labeled consistently, such as label consensus
- Tools that make improving data quality more systematic
- Tools that automate the creation of high-quality supervised learning training data from low-quality resources, such as forced alignment in speech recognition
- Tools that produce consistent and low noise data samples,or remove labeling noise or inconsistencies from existing data
- Tools for controlling what goes into the dataset and for making high-level edits efficiently to very large datasets, e.g. adding new words, languages, or accents to speech datasets with thousands of hours
- Search methods for finding suitably licensed datasets based on public resources
- Tools for creating training datasets for small data problems, or for rare classes in the long tail of big data problems
- Tools for timely incorporation of feedback from production systems into datasets
- Tools for understanding dataset coverage of important classes, and editing them to cover newly identified important cases
- Dataset importers that allow easy combination and composition of existing datasets
- Dataset exporters that make the data consumable for models and interface with model training and inference systems such as webdataset.
- System architectures and interfaces that enable composition of dataset tools such as MLCube, Docker, Airflow

Algorithms for working with limited labeled data and improving label efficiency:
- Data selection techniques such as active learning and coreset selection for identifying the most valuable examples to label.
- Semi-supervised learning, few-shot learning, and weak supervision methods for maximizing the power of limited labeled data.
- Transfer learning and self-supervised learning approaches for developing powerful representations that can be used for many downstream tasks with limited labeled data

Algorithms for working with shifted, drifted, out-of-distribution data
- New datasets for bias evaluation and analysis
- New algorithms for fixing shifted, drifted, OOD data

Algorithms for working with biased data
- New datasets for bias evaluation and analysis
- New algorithms for automated elimination of bias in data
- New algorithms for model training with biased data

Important Dates

Workshop Call for Papers	April 19, 2025
Workshop Paper Submission	~~May 8, 2025~~ May 31, 2025
Notification of Workshop Papers Acceptance	June 8, 2025
Workshop Date	August 3, 2025

Panel Discussion Speaker

Hui Xiong

Hong Kong University of Science and Technology (Guangzhou), xionghui@hkust-gz.edu.cn

Keynote Speaker

Huan Liu

Arizona State University, huanliu@asu.edu

Speaking Topics

Artificial Intelligence (AI)

Speaker's Notes

TBD

Suzanne M. Shontz

University of Kansas, shontz@ku.edu

Speaking Topics

Artificial Intelligence (AI)

Speaker's Notes

TBD

Nitesh Chawla

University of Notre Dame, nchawla@nd.edu

Speaking Topics

Artificial Intelligence (AI)

Speaker's Notes

TBD

Organizing Committee

Yanjie Fu

Arizona State University
yanjie.fu@asu.edu

Kunpeng Liu

Portland State University
kunpeng@pdx.edu

Dongjie Wang

University of Kansas
wangdongjie@ku.edu

Xiangliang Zhang

University of Notre Dame
xzhang33@nd.edu

Khalid Osman

Stanford University
osmank@stanford.edu

Charu Aggarwal

IBM T.J. Watson Research Center
charu@us.ibm.com

Jian Pei

Duke University
j.pei@duke.edu

Volunteers

Rui Liu

University of Kansas
Ph.D Student

Tao Zhe

University of Kansas
Ph.D Student