As artificial intelligence is progressively penetrating our daily lives, data sharing is increasingly locking horns with data privacy concerns. Synthetic data are gaining traction as a potential solution to the aporetic conflict between privacy and utility. The goal of synthetic data is to preserve meaningful statistical information about the dataset, but without risk of exposing private information. Synthetic data are expected to have great potential in areas such as health care, where patient data are protected by privacy laws. But can we even construct synthetic data that are simultaneously private and accurate?
And what do privacy and accuracy actually mean in this context? Trying to answer these questions leads to deep mathematical challenges. I will introduce various mathematical concepts of privacy and utility and discuss associated privacy-utility tradeoffs. I will then present some of our recent breakthroughs in the NP-hard challenge of the computationally efficient creation of synthetic data that come with provable privacy and utility guarantees. Applications and open problems will be discussed. This is joint work with March Boedihardjo and Roman Vershynin.