Master of Science in Computer Science
Click here to change this text. Lorem ipsum dolor sit In November 2023, I successfully defended my Master of Science thesis, which focused on addressing challenges in data provenance, a critical aspect of Big Data for tracking and documenting the lineage of large datasets. I also published this research at a provenance conference in April 2023. My thesis proposed a practical solution that computes a sample of provenance for existing results without the need for full provenance computation. Our technique samples provenance based on its distribution with respect to query results, estimated from the input data distribution while accounting for data correlations. The evaluation demonstrated that, compared to traditional methods, this approach efficiently computes a large sample of provenance with minimal errors.