Charles River Privacy Day — November 22, 2024
Please join us for a day of talks on the theory and practice of statistical data privacy!Confirmed Speakers
- Gautam Kamath, University of Waterloo
- Priyanka Nanayakkara, Harvard
- Virginia Smith, CMU
- Jalaj Upadhyay, Rutgers University
Registration: Attendance is free, but please register via this Registration Form if you plan to come.
Location:
43 Hawes Street
Brookline, MA 02446
(map)
Organizer:
Charles River Privacy Day is organized by the Boston-Area Data Privacy Group, which spans BU, Northeastern and Harvard.
Tentative Schedule:
9:00am - 9:30am | Welcome, coffee |
9:30am - 10:30am | Gautam Kamath, University of Waterloo TBA |
10:30am - 11:00am | Break |
11:00am - 12:00pm | Priyanka Nanayakkara, Harvard TBA |
12:00pm - 2:00pm | Lunch (provided) |
2:00pm - 3:00pm | Jalaj Upadhyay, Rutgers University TBA |
3:00pm - 3:30pm | Break |
3:30pm - 4:30pm | Virginia Smith, CMU TBA |
Talk Information:
-
Stay Tuned!
Charles River Privacy Day — May 11th, 2023
Please join us for a day of talks on the theory and practice of statistical data privacy! Inspired by the success of the Charles River Crypto Days, the Charles River Privacy Days will resume—after a decade-long hiatus—on Thursday, May 11, 2023. The event will be followed on Friday, May 12 by a Crypto Day workshop.Confirmed Speakers
- Jelani Nelson, UC Berkeley
- Sewoong Oh, University of Washington
- Jayshree Sarathy, Harvard University
- Jessica Sorrell, University of Pennsylvania
Organizer:
Charles River Privacy Day is organized by the Boston-Area Data Privacy Group, which spans BU, Northeastern and Harvard.
Registration: Attendance is free, but please register via this Registration Form if you plan to come.
Location:
Center for Computational and Data Sciences Boston Univeristy, 17th floor
665 Commonwealth Ave, Boston, MA 02215
(map)
Schedule:
9:00am - 9:30am | Welcome, coffee |
9:30am - 10:30am | Sewoong Oh (UW)Rethinking Auditing Privacy from First Principles |
10:30am - 11:00am | Break |
11:00am - 12:00pm | Jessica Sorrell (Penn)Connections Between Replicability, Privacy, and Perfect Generalization |
12:00pm - 2:00pm | Lunch (provided) |
2:00pm - 3:00pm | Jelani Nelson (Berkeley)New Local Differentially Private Protocols for Frequency and Mean Estimation |
3:00pm - 3:30pm | Break |
3:30pm - 4:30pm | Jayshree Sarathy (Harvard)Data Perceptions and Practices: Exploring Social Factors around the Uptake of Differential Privacy |
Talk Information:
- Speaker: Jelani Nelson, UC Berkeley Title: New local differentially private protocols for frequency and mean estimation Abstract: Consider the following examples of distributed applications: a texting app wants to train ML models for autocomplete based on text history residing on-device across millions of devices, or the developers of some other app want to understand common app settings by their users. In both cases, and many others, a third party wants to understand something in the aggregate about a large distributed database but under the constraint that each individual record requires some guarantee of privacy. Protocols satisfying so-called local differential privacy have become the gold standard for guaranteeing privacy in such situations, and in this talk I will discuss new such protocols for two of the most common problems that require solutions in this framework: frequency estimation, and mean estimation. Based on joint works with subsets of Hilal Asi, Vitaly Feldman, Huy Le Nguyen, and Kunal Talwar.
- Speaker: Sewoong Oh, University of Washington Title: Rethinking auditing privacy from first principles Abstract: Differentially private training of models is error prone due to miscalculations of the sensitivity or mistakes in the implementations. Techniques for detecting such false claims of differential privacy are critical in ensuring a trustworthy ecosystem where codes and algorithms commonly are shared. A major bottleneck in the standard statistical approaches for auditing private training is in its sample complexity. Drawing a single sample for auditing can be as computationally intense as training a model from scratch. We are in a dire need of sample efficient approaches to provide a tight lower bound on the privacy leakage. However, if we hang on to the standard definition of differential privacy, then we are fundamentally limited by the sample dependence of the Bernoulli confidence intervals involved. To break this barrier, it requires rethinking differential privacy from first principles. To this end, we introduce Lifted Differential Privacy that, while equivalent to differential privacy, allows the privacy auditor to search over a larger space of counterexamples. We exploit this lifted search space in our novel design of audits that inject multiple canary examples. By generating those canaries randomly, the Lifted DP condition allows us to reuse each trained model and run multiple binary hypothesis tests for the presence or absence of each canary. Together with a novel confidence interval that exploits the (lack of) correlation between those test statistics, we showcase the significant gain in sample complexity both theoretically and empirically.
- Speaker: Jayshree Sarathy, Harvard University Title: Data Perceptions and Practices: Exploring Social Factors around the Uptake of Differential Privacy Abstract: Deployments of differential privacy (DP) in the last decade have drawn attention to the challenges of bridging theory and practice. In this talk, we will explore some of the social factors—in particular, perceptions and practices around data—that shape these challenges. First, we consider the modernization of disclosure avoidance in the 2020 U.S. Census, examining how the move to DP revealed epistemic disconnects around what we identify as a “statistical imaginary” of census data. Second, we discuss a user study to investigate the utility of DP tools for social science researchers, exploring tensions between DP and the practices of data science. These two studies raise questions around centering social factors when deploying DP for public interest. This talk is based on work with danah boyd, Audrey Haque, Tania Schlatter, Sophia Song, and Salil Vadhan.
- Speaker: Jessica Sorrell, University of Pennsylvania Title: Connections Between Replicability, Privacy, and Perfect Generalization Abstract: Replicability is vital to ensuring scientific conclusions are reliable, but failures of replicability have been a major issue in nearly all scientific areas of study in recent decades. A key issue underlying the replicability crisis is the explosion of methods for data generation, screening, testing, and analysis, where, crucially, only the combinations producing the most significant results are reported. Such practices (also known as p-hacking, data dredging, and researcher degrees of freedom) can lead to erroneous findings that appear to be significant, but that don’t hold up when other researchers attempt to replicate them. In this talk, we will explore connections between replicability and other stability notions that have proven useful in ensuring statistical validity. We will discuss statistical equivalences between replicability, approximate differential privacy, and perfect generalization, as well as computational separations. This talk is based on work with Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, and Satchit Sivakumar.