Home

The 1st Workshop on AI for Data Editing aims to bring together researchers, practitioners, and policymakers to explore innovative AI-driven solutions for the multifaceted challenges in data editing. As data grows exponentially, there is an urgent demand for advanced strategies in data preprocessing, cleaning, transformation, quality control, as well as better understanding of complex interdependencies in large-scale data workflows. This workshop will delve into the latest AI technologies to facilitate efficient, accurate, and human-centric data editing processes.

The workshop is held in conjunction with the 31th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2025), one of the world's premier conferences in data science, data mining, and big data analytics. Organized by ACM SIGKDD, KDD is a key platform where researchers, practitioners, and industry experts present groundbreaking advancements and explore emerging trends in data mining, machine learning, and AI. This partnership provides the workshop with a vital forum for interdisciplinary knowledge exchange, fostering collaboration among data scientists, domain experts, and policymakers.


Submission Guidelines

We invite the submission of regular research papers, which cannot exceed 9 pages, including an appendix, plus unlimited references (paper content is limited to 9 pages, that means that if you have an appendix, then it should be included within that page limit. It is also ok if you do not have an appendix and instead 9 pages of content). Submissions must be in PDF format, and formatted according to the new Standard ACM sigconf template. Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. All the papers are required to be submitted via the easychair system.

We invite authors to submit their papers via the EasyChair submission portal: https://easychair.org/conferences?conf=ai4de


Topics of Interest

We encourage submissions on a broad range of topics related to AI for data editing, including but not limited to:

  • Methods for automated data science
    • Automated data cleaning, denoising, interpolation, refinery and quality improvement
    • Automated feature selection, generation, and feature-instance joint selection
    • Automated data representation learning or reconstruction
    • Automated outlier detection and removal
  • New datasets in domain application areas
    • in speech, vision, manufacturing, smart cities, transportationmobile computing, sensing, medical, recommendation, personalization, science domain
  • Tools and methodologies for accelerating open-source dataset preparation and iteration
    • Tools that quantify and accelerate time to source and prepare high-quality data
    • Tools that ensure that the data is labeled consistently, such as label consensus
    • Tools that make improving data quality more systematic
    • Tools that automate the creation of high-quality supervised learning training data from low-quality resources, such as forced alignment in speech recognition
    • Tools that produce consistent and low noise data samples,or remove labeling noise or inconsistencies from existing data
    • Tools for controlling what goes into the dataset and for making high-level edits efficiently to very large datasets, e.g. adding new words, languages, or accents to speech datasets with thousands of hours
    • Search methods for finding suitably licensed datasets based on public resources
    • Tools for creating training datasets for small data problems, or for rare classes in the long tail of big data problems
    • Tools for timely incorporation of feedback from production systems into datasets
    • Tools for understanding dataset coverage of important classes, and editing them to cover newly identified important cases
    • Dataset importers that allow easy combination and composition of existing datasets
    • Dataset exporters that make the data consumable for models and interface with model training and inference systems such as webdataset.
    • System architectures and interfaces that enable composition of dataset tools such as MLCube, Docker, Airflow
  • Algorithms for working with limited labeled data and improving label efficiency:
    • Data selection techniques such as active learning and coreset selection for identifying the most valuable examples to label.
    • Semi-supervised learning, few-shot learning, and weak supervision methods for maximizing the power of limited labeled data.
    • Transfer learning and self-supervised learning approaches for developing powerful representations that can be used for many downstream tasks with limited labeled data
  • Algorithms for working with shifted, drifted, out-of-distribution data
    • New datasets for bias evaluation and analysis
    • New algorithms for fixing shifted, drifted, OOD data
  • Algorithms for working with biased data
    • New datasets for bias evaluation and analysis
    • New algorithms for automated elimination of bias in data
    • New algorithms for model training with biased data

Important Dates

Workshop Call for Papers April 19th, 2025
Workshop Paper Submission May 8th, 2025
Notification of Workshop Papers Acceptance June 8th, 2025
Workshop Date August 6th, 2025

Organizing Committee

Placeholder

Yanjie Fu

Arizona State University
yanjie.fu@asu.edu

Placeholder

Kunpeng Liu

Portland State University
kunpeng@pdx.edu

Placeholder

Dongjie Wang

University of Kansas
wangdongjie@ku.edu

Placeholder

Xiangliang Zhang

University of Notre Dame
xzhang33@nd.edu

Placeholder

Khalid Osman

Stanford University
osmank@stanford.edu

Placeholder

Charu Aggarwal

IBM T.J. Watson Research Center
charu@us.ibm.com

Placeholder

Suzanne M. Shontz

University of Kansas
shontz@ku.edu

Placeholder

Huan Liu

Arizona State University
huanliu@asu.edu

Placeholder

Jian Pei

Duke University
j.pei@duke.edu


Volunteers

Placeholder

Rui Liu

University of Kansas
Ph.D Student

Placeholder

Tao Zhe

University of Kansas
Ph.D Student