Charles River Privacy Day — November 22, 2024
Please join us for a day of talks on the theory and practice of statistical data privacy!Confirmed Speakers
- Gautam Kamath, University of Waterloo
- Priyanka Nanayakkara, Harvard
- Virginia Smith, CMU
- Jalaj Upadhyay, Rutgers University
Registration: Attendance is free, but please register via this Registration Form if you plan to come.
Location:
43 Hawes Street
Brookline, MA 02446
(map)
Organizer:
Charles River Privacy Day is organized by the Boston-Area Data Privacy Group, which spans BU, Northeastern and Harvard.
Tentative Schedule:
9:00am - 9:30am | Welcome, coffee |
9:30am - 10:30am | Gautam Kamath, University of Waterloo The Broader Landscape of Robustness in Mean Estimation |
10:30am - 11:00am | Break |
11:00am - 12:00pm | Priyanka Nanayakkara, Harvard Differential Privacy Interfaces for Data Users |
12:00pm - 2:00pm | Lunch (provided) |
2:00pm - 3:00pm | Jalaj Upadhyay, Rutgers University A brief history of differentially private continual counting and its applications |
3:00pm - 3:30pm | Break |
3:30pm - 4:30pm | Virginia Smith, CMU Getting Lost in ML Safety Vibes |
Talk Information:
- Speaker: Gautam Kamath, University of Waterloo Title: The Broader Landscape of Robustness in Mean Estimation Abstract: Mean estimation is a simple statistical task, solved optimally by a simple algorithm: the empirical mean. The story changes dramatically when the estimator is required to be robust. What exactly it means for an estimator to be robust can depend on the situation. It can be robust to model misspecification or adversarial contamination. It can be robust to outliers caused by heavy-tailed data. Or it can be robust in the sense of differential privacy. We will discuss how, surprisingly, we can design computationally efficient algorithms subject to all these types of robustness using the same underlying technical ideas, resulting in powerful estimators that are qualitatively quite different from the empirical mean. These connections offer glimpses of a broader landscape of robustness in statistical estimation.
- Speaker: Priyanka Nanayakkara, Harvard Title: Differential Privacy Interfaces for Data Users Abstract: Differential privacy (DP) offers a suite of theoretical tools for enabling privacy-preserving data science. However, using these tools in practice often requires a team of DP experts to navigate new constraints imposed by DP on analysis (e.g., spending epsilon), limiting DP's real-world impact. In this talk, I will describe how interactive interfaces can help data users without DP expertise effectively use DP, widening the potential for DP's impact. Specifically, I will present (1) an interactive interface for data curators setting epsilon while considering implications to both accuracy of estimates and disclosure risk and (2) an interactive paradigm---instantiated in an interactive interface---to support analysts in spending epsilon efficiently during exploratory analysis. Together, these works demonstrate the promise of interfaces in bringing DP from theory into practice. Based on joint works with Johes Bater, Xi He, Jessica Hullman, Hyeok Kim, Narges Mahyar, Gerome Miklau, Jennie Rogers, Ali Sarvghad, and Yifan Wu.
- Speaker: Jalaj Upadhyay, Rutgers University Title: A brief history of differentially private continual counting and its applications Abstract: Differential privacy is now considered the de facto notion of privacy; however, there is still a wide gap between the theory and practice of differential privacy. This talk will discuss the fundamental reasons for this discrepancy using continual counting as an example. We will briefly overview the problem and discuss recent advances in differentially private continual observation that achieves fine-grained error bounds. Time permitting, we will also discuss some of the implications of these results in both real-world applications and abstract mathematics. This talk is based on joint work with Hendrik Fichtenberger, Monika Henzinger, and Sarvagya Upadhyay.
- Speaker: Virginia Smith, CMU Title: Getting Lost in ML Safety Vibes Abstract: Machine learning applications are increasingly reliant on black-box pretrained models. To ensure safe use of these models, techniques such as unlearning, guardrails, and watermarking have been proposed to curb model behavior and audit usage. Unfortunately, while these post-hoc approaches give positive safety ‘vibes’ when evaluated in isolation, our work shows that existing techniques are quite brittle when deployed as part of larger systems. In a series of recent works, we show that: (a) small amounts of auxiliary data can be used to 'jog' the memory of unlearned models; (b) current unlearning benchmarks obscure deficiencies in both finetuning and guardrail-based approaches; and (c) simple, scalable attacks erode existing LLM watermarking systems and reveal fundamental trade-offs in watermark design. Taken together, these results highlight major deficiencies in the practical use of post-hoc ML safety methods. We end by discussing promising alternatives to ML safety, which instead aim to ensure safety by design during the development of ML systems..
Charles River Privacy Day — May 11th, 2023
Please join us for a day of talks on the theory and practice of statistical data privacy! Inspired by the success of the Charles River Crypto Days, the Charles River Privacy Days will resume—after a decade-long hiatus—on Thursday, May 11, 2023. The event will be followed on Friday, May 12 by a Crypto Day workshop.Confirmed Speakers
- Jelani Nelson, UC Berkeley
- Sewoong Oh, University of Washington
- Jayshree Sarathy, Harvard University
- Jessica Sorrell, University of Pennsylvania
Organizer:
Charles River Privacy Day is organized by the Boston-Area Data Privacy Group, which spans BU, Northeastern and Harvard.
Registration: Attendance is free, but please register via this Registration Form if you plan to come.
Location:
Center for Computational and Data Sciences Boston Univeristy, 17th floor
665 Commonwealth Ave, Boston, MA 02215
(map)
Schedule:
9:00am - 9:30am | Welcome, coffee |
9:30am - 10:30am | Sewoong Oh (UW)Rethinking Auditing Privacy from First Principles |
10:30am - 11:00am | Break |
11:00am - 12:00pm | Jessica Sorrell (Penn)Connections Between Replicability, Privacy, and Perfect Generalization |
12:00pm - 2:00pm | Lunch (provided) |
2:00pm - 3:00pm | Jelani Nelson (Berkeley)New Local Differentially Private Protocols for Frequency and Mean Estimation |
3:00pm - 3:30pm | Break |
3:30pm - 4:30pm | Jayshree Sarathy (Harvard)Data Perceptions and Practices: Exploring Social Factors around the Uptake of Differential Privacy |
Talk Information:
- Speaker: Jelani Nelson, UC Berkeley Title: New local differentially private protocols for frequency and mean estimation Abstract: Consider the following examples of distributed applications: a texting app wants to train ML models for autocomplete based on text history residing on-device across millions of devices, or the developers of some other app want to understand common app settings by their users. In both cases, and many others, a third party wants to understand something in the aggregate about a large distributed database but under the constraint that each individual record requires some guarantee of privacy. Protocols satisfying so-called local differential privacy have become the gold standard for guaranteeing privacy in such situations, and in this talk I will discuss new such protocols for two of the most common problems that require solutions in this framework: frequency estimation, and mean estimation. Based on joint works with subsets of Hilal Asi, Vitaly Feldman, Huy Le Nguyen, and Kunal Talwar.
- Speaker: Sewoong Oh, University of Washington Title: Rethinking auditing privacy from first principles Abstract: Differentially private training of models is error prone due to miscalculations of the sensitivity or mistakes in the implementations. Techniques for detecting such false claims of differential privacy are critical in ensuring a trustworthy ecosystem where codes and algorithms commonly are shared. A major bottleneck in the standard statistical approaches for auditing private training is in its sample complexity. Drawing a single sample for auditing can be as computationally intense as training a model from scratch. We are in a dire need of sample efficient approaches to provide a tight lower bound on the privacy leakage. However, if we hang on to the standard definition of differential privacy, then we are fundamentally limited by the sample dependence of the Bernoulli confidence intervals involved. To break this barrier, it requires rethinking differential privacy from first principles. To this end, we introduce Lifted Differential Privacy that, while equivalent to differential privacy, allows the privacy auditor to search over a larger space of counterexamples. We exploit this lifted search space in our novel design of audits that inject multiple canary examples. By generating those canaries randomly, the Lifted DP condition allows us to reuse each trained model and run multiple binary hypothesis tests for the presence or absence of each canary. Together with a novel confidence interval that exploits the (lack of) correlation between those test statistics, we showcase the significant gain in sample complexity both theoretically and empirically.
- Speaker: Jayshree Sarathy, Harvard University Title: Data Perceptions and Practices: Exploring Social Factors around the Uptake of Differential Privacy Abstract: Deployments of differential privacy (DP) in the last decade have drawn attention to the challenges of bridging theory and practice. In this talk, we will explore some of the social factors—in particular, perceptions and practices around data—that shape these challenges. First, we consider the modernization of disclosure avoidance in the 2020 U.S. Census, examining how the move to DP revealed epistemic disconnects around what we identify as a “statistical imaginary” of census data. Second, we discuss a user study to investigate the utility of DP tools for social science researchers, exploring tensions between DP and the practices of data science. These two studies raise questions around centering social factors when deploying DP for public interest. This talk is based on work with danah boyd, Audrey Haque, Tania Schlatter, Sophia Song, and Salil Vadhan.
- Speaker: Jessica Sorrell, University of Pennsylvania Title: Connections Between Replicability, Privacy, and Perfect Generalization Abstract: Replicability is vital to ensuring scientific conclusions are reliable, but failures of replicability have been a major issue in nearly all scientific areas of study in recent decades. A key issue underlying the replicability crisis is the explosion of methods for data generation, screening, testing, and analysis, where, crucially, only the combinations producing the most significant results are reported. Such practices (also known as p-hacking, data dredging, and researcher degrees of freedom) can lead to erroneous findings that appear to be significant, but that don’t hold up when other researchers attempt to replicate them. In this talk, we will explore connections between replicability and other stability notions that have proven useful in ensuring statistical validity. We will discuss statistical equivalences between replicability, approximate differential privacy, and perfect generalization, as well as computational separations. This talk is based on work with Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, and Satchit Sivakumar.