Datasets

Open datasets from HCAI research projects

Carinthia-S Dataset

The Carinthia-S dataset is an enhanced version of the original publicly available Carinthia dataset, augmented with expert-validated binary segmentation masks for each defect image. It contains Scanning Electron Microscope (SEM) images of defects observed on a single production layer of unstructured semiconductor wafers, along with their corresponding segmentation masks. The dataset comprises 4,591 images, each paired with a segmentation mask, unevenly distributed across six defect classes. The dataset's description is available in the 'carinthia-s_dataset.html' file, and the images themselves can be found in the 'data.zip' file.

Zenodo 2025

Access Dataset

PVD Process Multi-Output Regression Dataset

The dataset was collected between 2021 and 2023 from 16 process chambers across six PVD machines at Infineon Technologies AG. It comprises 3,598 procedures, each representing a single sample. 104 input features describe equipment and process conditions derived from aggregated sensor traces (Advanced Process Control data). The output space consists of 17 target variables corresponding to thickness measurements at spatially distributed wafer points (Statistical Process Control data). The dataset supports both single-output and vector-valued learning tasks for modelling physical layer properties in semiconductor manufacturing.

Zenodo 2025

Access Dataset

Learning Programmers Profile Dataset

A longitudinal dataset of weekly per-student compiler error counts spanning 22 weeks of an introductory programming course. The dataset captures occurrences of 20 distinct compiler error types per student per week, stored as individual CSV files for each week. It was collected to support research on personalized learning systems and the analysis and prediction of programming learning behaviours. Supporting materials from UMAP 2023 (https://doi.org/10.1145/3563359.3597400).

GitHub 2023

Access Dataset

AI Tool Use in Programming Education Dataset

A dataset comprising weekly self-reported data from 165 students enrolled in an introductory CS1 programming course (2024/2025 academic year) at the Faculty of Electrical Engineering, University of Sarajevo. The data captures students' weekly AI tool usage, perceived difficulty of subject areas, motivational factors (expectancy-value constructs), personality traits, preferred learning styles, attitudes toward AI, and final course performance. Supporting materials from UMAP 2025 (https://doi.org/10.1145/3708319.3733692).

OSF 2025

Access Dataset

Compiler Error Frequency and Performance Dataset

A dataset capturing weekly and cumulative compiler error frequencies alongside student performance metrics from an introductory programming course. It enables analysis of how error patterns evolve over the course of a semester and their relationship to final learning outcomes.

OSF 2025

Access Dataset