CRISP: Charles River Symposium on Privacy — April 11, 2025
Please join us for a day of talks on the theory and practice of statistical data privacy!Confirmed Speakers
- Ainesh Bakshi, MIT
- Tommaso d'Orsi, Bocconi University
- Alessandro Epasto, Google
- Kobbi Nissim, Georgetown University
Registration: Attendance is free, but advance registration is necessary. but please register via this Registration Form if you plan to come.
Location:
SEC Building, Harvard University, Room LL2.229
150 Western Ave, Boston MA 02134
(map)
Organizer:
CRISP is organized by the Boston-Area Data Privacy Group, which spans BU, Northeastern and Harvard.
Other Events:
This CRISP unfortunately conflicts with ABSURD (at Tufts in the afternoon).
Tentative Schedule:
9:00am - 9:30am | Welcome, coffee |
9:30am - 10:30am | Tommaso d'Orsi, Bocconi University Projections and other tools for differentially private algorithm design |
10:30am - 11:00am | Break |
11:00am - 12:00pm | Alessandro Epasto, Google Efficient Private Algorithms for Graphs (and Using Graphs) |
12:00pm - 2:00pm | Lunch (provided) |
2:00pm - 3:00pm | Ainesh Bakshi, MIT Sample-Optimal Private Regression in Polynomial Time |
3:00pm - 3:30pm | Break |
3:30pm - 4:30pm | Kobbi Nissim, Georgetown University TBD |
Talk Information:
- Speaker: Tommaso d'Orsi, Bocconi University Title: Projections and other tools for differentially private algorithm design Abstract: Through a series of examples, we will review some simple yet powerful recipes for designing efficient private algorithms in high dimensional settings. In the context of estimation problems, we will discuss a particular instantiation of the "robustness to privacy” paradigm and use it to learn SBMs and mixtures of spherical Gaussians under differential privacy. We will then see how one can exploit the geometry of the space to design DP algorithms to release graph powers and k-way marginals. Finally, time permitting, we will see how similar ideas are also implicitly used in practical applications.
- Speaker: Alessandro Epasto, Google Title: Efficient Private Algorithms for Graphs (and Using Graphs) Abstract: Graphs are a fundamental data structure in machine learning applications and lie at the core of countless industrial applications. In this talk, I will cover recent research on differentially private (DP) algorithms for graphs. DP is a strong notion of privacy guarantees, promising plausible deniability for user data. I will discuss our work on designing efficient graph algorithms with a variety of applications from theory to practice. First, I will cover our work on efficient algorithms for finding communities in Stochastic Block Model (SBM) graphs. In this work, we demonstrate that it is possible to achieve exact recovery across a wide range of model parameters with near-linear time and space complexity. Our result is based on designing an efficient graph-perturbation method that, for a restricted (but sufficiently powerful) class of graph algorithms, allows us to inject significantly less noise than what is generally required to privatize a graph. This is a significant improvement over previous SBM recovery algorithms, which required quadratic time (at least) for a constant privacy budget. Then, I will show how efficient graph algorithms play a surprising role in the seemingly non-graph problem of sanitizing a private dataset of arbitrary items to extract as many items as possible while respecting user privacy. These two applications highlight the need for further research in efficient graph algorithms for privacy applications.
- Speaker: Ainesh Bakshi, MIT Title: Sample-Optimal Private Regression in Polynomial Time Abstract: Consider the task of privately obtaining prediction error guarantees via least-squares regression with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23] to account for the geometry induced by the covariance of the input samples. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.
- Speaker: Kobbi Nissim, Georgetown University Title: TBD Abstract: TBD
Charles River Privacy Day — November 22, 2024
Please join us for a day of talks on the theory and practice of statistical data privacy!Confirmed Speakers
- Gautam Kamath, University of Waterloo
- Priyanka Nanayakkara, Harvard
- Virginia Smith, CMU
- Jalaj Upadhyay, Rutgers University
Registration: Attendance is free, but please register via this Registration Form if you plan to come.
Location:
43 Hawes Street
Brookline, MA 02446
(map)
Organizer:
Charles River Privacy Day is organized by the Boston-Area Data Privacy Group, which spans BU, Northeastern and Harvard.
Tentative Schedule:
9:00am - 9:30am | Welcome, coffee |
9:30am - 10:30am | Gautam Kamath, University of Waterloo The Broader Landscape of Robustness in Mean Estimation |
10:30am - 11:00am | Break |
11:00am - 12:00pm | Priyanka Nanayakkara, Harvard Differential Privacy Interfaces for Data Users |
12:00pm - 2:00pm | Lunch (provided) |
2:00pm - 3:00pm | Jalaj Upadhyay, Rutgers University A brief history of differentially private continual counting and its applications |
3:00pm - 3:30pm | Break |
3:30pm - 4:30pm | Virginia Smith, CMU Getting Lost in ML Safety Vibes |
Talk Information:
- Speaker: Gautam Kamath, University of Waterloo Title: The Broader Landscape of Robustness in Mean Estimation Abstract: Mean estimation is a simple statistical task, solved optimally by a simple algorithm: the empirical mean. The story changes dramatically when the estimator is required to be robust. What exactly it means for an estimator to be robust can depend on the situation. It can be robust to model misspecification or adversarial contamination. It can be robust to outliers caused by heavy-tailed data. Or it can be robust in the sense of differential privacy. We will discuss how, surprisingly, we can design computationally efficient algorithms subject to all these types of robustness using the same underlying technical ideas, resulting in powerful estimators that are qualitatively quite different from the empirical mean. These connections offer glimpses of a broader landscape of robustness in statistical estimation.
- Speaker: Priyanka Nanayakkara, Harvard Title: Differential Privacy Interfaces for Data Users Abstract: Differential privacy (DP) offers a suite of theoretical tools for enabling privacy-preserving data science. However, using these tools in practice often requires a team of DP experts to navigate new constraints imposed by DP on analysis (e.g., spending epsilon), limiting DP's real-world impact. In this talk, I will describe how interactive interfaces can help data users without DP expertise effectively use DP, widening the potential for DP's impact. Specifically, I will present (1) an interactive interface for data curators setting epsilon while considering implications to both accuracy of estimates and disclosure risk and (2) an interactive paradigm---instantiated in an interactive interface---to support analysts in spending epsilon efficiently during exploratory analysis. Together, these works demonstrate the promise of interfaces in bringing DP from theory into practice. Based on joint works with Johes Bater, Xi He, Jessica Hullman, Hyeok Kim, Narges Mahyar, Gerome Miklau, Jennie Rogers, Ali Sarvghad, and Yifan Wu.
- Speaker: Jalaj Upadhyay, Rutgers University Title: A brief history of differentially private continual counting and its applications Abstract: Differential privacy is now considered the de facto notion of privacy; however, there is still a wide gap between the theory and practice of differential privacy. This talk will discuss the fundamental reasons for this discrepancy using continual counting as an example. We will briefly overview the problem and discuss recent advances in differentially private continual observation that achieves fine-grained error bounds. Time permitting, we will also discuss some of the implications of these results in both real-world applications and abstract mathematics. This talk is based on joint work with Hendrik Fichtenberger, Monika Henzinger, and Sarvagya Upadhyay.
- Speaker: Virginia Smith, CMU Title: Getting Lost in ML Safety Vibes Abstract: Machine learning applications are increasingly reliant on black-box pretrained models. To ensure safe use of these models, techniques such as unlearning, guardrails, and watermarking have been proposed to curb model behavior and audit usage. Unfortunately, while these post-hoc approaches give positive safety ‘vibes’ when evaluated in isolation, our work shows that existing techniques are quite brittle when deployed as part of larger systems. In a series of recent works, we show that: (a) small amounts of auxiliary data can be used to 'jog' the memory of unlearned models; (b) current unlearning benchmarks obscure deficiencies in both finetuning and guardrail-based approaches; and (c) simple, scalable attacks erode existing LLM watermarking systems and reveal fundamental trade-offs in watermark design. Taken together, these results highlight major deficiencies in the practical use of post-hoc ML safety methods. We end by discussing promising alternatives to ML safety, which instead aim to ensure safety by design during the development of ML systems..
Charles River Privacy Day — May 11th, 2023
Please join us for a day of talks on the theory and practice of statistical data privacy! Inspired by the success of the Charles River Crypto Days, the Charles River Privacy Days will resume—after a decade-long hiatus—on Thursday, May 11, 2023. The event will be followed on Friday, May 12 by a Crypto Day workshop.Confirmed Speakers
- Jelani Nelson, UC Berkeley
- Sewoong Oh, University of Washington
- Jayshree Sarathy, Harvard University
- Jessica Sorrell, University of Pennsylvania
Organizer:
Charles River Privacy Day is organized by the Boston-Area Data Privacy Group, which spans BU, Northeastern and Harvard.
Registration: Attendance is free, but please register via this Registration Form if you plan to come.
Location:
Center for Computational and Data Sciences Boston Univeristy, 17th floor
665 Commonwealth Ave, Boston, MA 02215
(map)
Schedule:
9:00am - 9:30am | Welcome, coffee |
9:30am - 10:30am | Sewoong Oh (UW)Rethinking Auditing Privacy from First Principles |
10:30am - 11:00am | Break |
11:00am - 12:00pm | Jessica Sorrell (Penn)Connections Between Replicability, Privacy, and Perfect Generalization |
12:00pm - 2:00pm | Lunch (provided) |
2:00pm - 3:00pm | Jelani Nelson (Berkeley)New Local Differentially Private Protocols for Frequency and Mean Estimation |
3:00pm - 3:30pm | Break |
3:30pm - 4:30pm | Jayshree Sarathy (Harvard)Data Perceptions and Practices: Exploring Social Factors around the Uptake of Differential Privacy |
Talk Information:
- Speaker: Jelani Nelson, UC Berkeley Title: New local differentially private protocols for frequency and mean estimation Abstract: Consider the following examples of distributed applications: a texting app wants to train ML models for autocomplete based on text history residing on-device across millions of devices, or the developers of some other app want to understand common app settings by their users. In both cases, and many others, a third party wants to understand something in the aggregate about a large distributed database but under the constraint that each individual record requires some guarantee of privacy. Protocols satisfying so-called local differential privacy have become the gold standard for guaranteeing privacy in such situations, and in this talk I will discuss new such protocols for two of the most common problems that require solutions in this framework: frequency estimation, and mean estimation. Based on joint works with subsets of Hilal Asi, Vitaly Feldman, Huy Le Nguyen, and Kunal Talwar.
- Speaker: Sewoong Oh, University of Washington Title: Rethinking auditing privacy from first principles Abstract: Differentially private training of models is error prone due to miscalculations of the sensitivity or mistakes in the implementations. Techniques for detecting such false claims of differential privacy are critical in ensuring a trustworthy ecosystem where codes and algorithms commonly are shared. A major bottleneck in the standard statistical approaches for auditing private training is in its sample complexity. Drawing a single sample for auditing can be as computationally intense as training a model from scratch. We are in a dire need of sample efficient approaches to provide a tight lower bound on the privacy leakage. However, if we hang on to the standard definition of differential privacy, then we are fundamentally limited by the sample dependence of the Bernoulli confidence intervals involved. To break this barrier, it requires rethinking differential privacy from first principles. To this end, we introduce Lifted Differential Privacy that, while equivalent to differential privacy, allows the privacy auditor to search over a larger space of counterexamples. We exploit this lifted search space in our novel design of audits that inject multiple canary examples. By generating those canaries randomly, the Lifted DP condition allows us to reuse each trained model and run multiple binary hypothesis tests for the presence or absence of each canary. Together with a novel confidence interval that exploits the (lack of) correlation between those test statistics, we showcase the significant gain in sample complexity both theoretically and empirically.
- Speaker: Jayshree Sarathy, Harvard University Title: Data Perceptions and Practices: Exploring Social Factors around the Uptake of Differential Privacy Abstract: Deployments of differential privacy (DP) in the last decade have drawn attention to the challenges of bridging theory and practice. In this talk, we will explore some of the social factors—in particular, perceptions and practices around data—that shape these challenges. First, we consider the modernization of disclosure avoidance in the 2020 U.S. Census, examining how the move to DP revealed epistemic disconnects around what we identify as a “statistical imaginary” of census data. Second, we discuss a user study to investigate the utility of DP tools for social science researchers, exploring tensions between DP and the practices of data science. These two studies raise questions around centering social factors when deploying DP for public interest. This talk is based on work with danah boyd, Audrey Haque, Tania Schlatter, Sophia Song, and Salil Vadhan.
- Speaker: Jessica Sorrell, University of Pennsylvania Title: Connections Between Replicability, Privacy, and Perfect Generalization Abstract: Replicability is vital to ensuring scientific conclusions are reliable, but failures of replicability have been a major issue in nearly all scientific areas of study in recent decades. A key issue underlying the replicability crisis is the explosion of methods for data generation, screening, testing, and analysis, where, crucially, only the combinations producing the most significant results are reported. Such practices (also known as p-hacking, data dredging, and researcher degrees of freedom) can lead to erroneous findings that appear to be significant, but that don’t hold up when other researchers attempt to replicate them. In this talk, we will explore connections between replicability and other stability notions that have proven useful in ensuring statistical validity. We will discuss statistical equivalences between replicability, approximate differential privacy, and perfect generalization, as well as computational separations. This talk is based on work with Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, and Satchit Sivakumar.