Privacy amplification by repeated noise addition
Imagine we’re trying to privately estimate some statistic by adding independent noise to it. Furthermore, as in our previous post, we’ll assume noise is added across $n$ parties. Unlike the previous post, we will explicitly not take advantage of infinite divisibility. Rather, we will assume each party adds iid noise satisfying a local $(\alpha, \epsilon_0)$ Renyi DP bound (Mironov 2017), and use a neat amplification argument to show a privacy improvement. We’ll begin by stating the main result.
Theorem 1: Let $P$ be a continuous probability distribution on $\mathbb{R}$, and let $P^{*n}$ denote its $n$-fold convolution. Then
\[D_\alpha(P^{*n} + 1 || P^{*n}) \le n \cdot D_\alpha(P + 1/n || P)\]Where $D_\alpha$ denotes the Renyi divergence of order $\alpha$.
Amplification by iteration and the shift-reduction lemma
Theorem 1 follows from the work of Feldman et al. 2018, which shows a generic privacy amplification result under contractive iteration, which allowed them to better analyze the privacy of stochastic gradient descent.
We’ll consider a much simpler setting, but using a key lemma they proved called the shift-reduction lemma. Unfortunately, understanding the lemma requires introducing a few new concepts.
$\infty$-Wasserstein Distance
We’ll denote the $\infty$-Wasserstein distance between two distributions $U$ and $V$ as $W_\infty(U, V)$. Given that we are working with distributions in a relatively simple space (the reals), I’ll refrain from a formal measure-theoretic definition of the Wasserstein distance. Instead we will simply say that the $W_\infty(U, V) \le s$ is equivalent to saying that $\Pr[|U - V| \le s] = 1$, where we abuse notation a bit between a random variable and its distribution.
In our case we can think about the distance as measuring how shifted $U$ is with respect to $V$. If $U = V + x$ for $x$ a constant, $W_\infty(U, V) = x$.
Shifted Renyi divergence
The $z$-shifted Renyi divergence between distributions $U$ and $V$ is defined as
\[D_{\alpha}^{(z)}(U || V) = \inf_{U' : W_\infty(U, U') \le z} D_\alpha(U' || V)\]In other words, we are minimizing the Renyi Divergence by considering all possible distributions $U’$ that are $z$-close to $U$ in the Wasserstein distance.
The shift-reduction lemma
Finally we can state a simplified version of the shift-reduction lemma:
Lemma 2: Let $U, V$ and $\zeta$ be distributions on $\mathbb{R}$, then for any $a \ge 0$,
\[D_{\alpha}^{(z)}(U + \zeta|| V + \zeta) \le D_{\alpha}^{(z + a)}(U || V) + R_\alpha(\zeta, a)\]Where \(R_\alpha(\zeta, a) = \sup_{x: |x| \le a} D_\alpha(\zeta + x || \zeta)\) measures how well $\zeta$ hides a constant shift.
Proof of Theorem 1
The proof is a fairly simple application of the shift-reduction lemma.
\(\begin{align*} D_{\alpha}(P^{*n} + 1 || P^{*n}) &\le D_{\alpha}^{(a_1)}(P + 1 || P) + R_\alpha(P^{*n-1}, a_1)\\ &= D_{\alpha}(P + 1 - a_1 || P) + D_\alpha(P^{*n-1} + a_1 || P^{*n-1})\\ &\le D_{\alpha}(P + 1 - a_1 || P) + D_{\alpha}(P + a_1 - a_2 || P) + D_\alpha(P^{*n-2} + a_2 || P^{*n-2})\\ &\cdots\\ &\le D_\alpha(P + a_{n-1} || P) + \sum_{i=1}^{n-1} D_\alpha(P + a_{i-1} - a_i || P) &\text{ where $a_0 = 1$} \end{align*}\).
Now set $a$ values such that $a_{i-1} - a_i = 1/n$ i.e. $\{a_0, a_1, \dots, a_{n-1}\} = \{1, \frac{n-1}{n}, \dots, 1/n\}$. Then this expression is simplified to
\(D_\alpha(P^{*n} + 1 || P^{*n}) \le n \cdot D_\alpha(P + \frac{1}{n} || P)\).
Applications
Gaussian noise
Let $P \sim N(0, \sigma^2)$. We know from Mironov 2017 that $D_\alpha(P + \Delta || P) = \frac{\alpha \Delta^2}{2 \sigma^2}$. From Theorem 1 we can say that
\[D_\alpha(P^{*n} + 1 || P^{*n}) \le n \cdot D_\alpha(P + 1/n || P) = n \cdot \frac{\alpha (1/n)^2}{2 \sigma^2} = \frac{\alpha}{2 n \sigma^2}.\]We know this is tight since $P^{*n} \sim N(0, n \sigma^2)$.
Laplace noise
Again from Mironov 2017 we know that if $P \sim Lap(\lambda)$:
\[D_\alpha(P + \Delta || P) = \frac{1}{\alpha - 1}\log\left( \frac{\alpha}{2 \alpha -1} \exp\left(\frac{\Delta (\alpha-1)}{\lambda}\right) + \frac{\alpha-1}{2\alpha - 1}\exp\left(-\frac{\Delta \alpha}{\lambda}\right) \right).\]So from Theorem 1 we can say: \(\begin{align*} D_\alpha(P^{*n} + 1 || P^{*n}) &\le n \cdot D_\alpha(P + 1/n || P)\\ &= \frac{n}{\alpha - 1}\log\left( \frac{\alpha}{2 \alpha -1} \exp\left(\frac{\alpha-1}{n \lambda}\right) + \frac{\alpha-1}{2\alpha - 1}\exp\left(-\frac{\alpha}{n \lambda}\right) \right)\\ &\sim \frac{\alpha}{2 n \lambda^2} \text{(as $n \to \infty$).} \end{align*}\)
Since \(Var(P^{*n}) = 2 n \lambda^2\) , we can say the RDP is bounded by $\frac{\alpha}{Var(P^{*n})}$ as $n \to \infty$, which is a factor of 2 worse than what we get with Gaussian noise. My gut is that this is not tight, but right now I am too lazy to run the numerical accounting tools on Laplace convolutions. Still, it’s pretty cool this technique gets within a constant factor of the Gaussian mechanism.
Discrete distributions
Unfortunately the technique does not extend naturally to discrete distributions on $\mathbb{Z}$. One thing we can do is say that $D_\alpha(P + \Delta || P) \le \Delta D_\alpha(P + 1 || P)$, but unless we are doing $\Delta$-summation anyway, it’s not really as natural. For instance, if we are doing binary summation we could intentionally crank up $\Delta$ and rescale, but at that point we’re pretty much in the “continuous approximation” regime anyways.