Privacy Preserving Signal Processing

1. Introduction

Processing private data on the cloud is an inevitable fact nowadays, as the mobile market grows rapidly while moving computations to the cloud. Yet, preserving user privacy for cloud computing remains a big concern for both users and researchers. In the ideal case, the user should put an encrypted version of her data on the cloud, while the cloud is still able to accomplish the required processing task successfully on the encrypted data.

Extensive work has been done in the area of privacy preserving signal processing, including two main research lines: homomorphic encryption and secure multiparty computation (MPC). In this post, we will discuss both techniques with examples and compare them.

Among many applications for privacy preserving signal processing, we will focus on a TV detection system using mobile phones. In this system, we leverage mobile phones, which are now always in hands, to record audio while users being indoors. We use the recorded audio samples to detect if the user is watching over-the-air (OTA) TV channels or not. Our system accomplish this task by correlating the uploaded data from users with live captured streams from OTA channels received through ATSC antenna and TV tuner.

We assume in the whole post, that the adversary model to be studied is the semi-honest adversary model in which the adversary follows all the steps in the protocol, but will try to know any leaked information about the other party’s values.

2. Homomorphic Encryption

In this section, we will present the details of the homomorphic encryption while focusing on additively homomorphic and multiplicative homomorphic cryptosystems.

2.1. Problem Definition

Given a feature vector {x} recorded and uploaded by a user named Alice to the processing server Bob, the server is responsible for estimating a certain function {f_1(x,y_i)}. To preserve the privacy of Alice, she sends the encrypted version {E_{PK}(x)} using her primary key and Bob estimates {f_2(E_k(x),y_i)} which satisfies the following property {f_1(x,y_i) = D_{SK}(f_2(E_k(x),y_i))}.

2.2. Additively Homomorphic Encryption

Theorem 1 . Given plaintext {x_1} and {x_2}, a public-key cryptosystem is additively homomorphic if {D_{SK}(E_{PK}(x_1).E_{PK}(x_2)) = x_1 + x_2}

A direct result from previous theorem, {D_{SK}(E_{PK}(x)^w) = w x}, as {w x = \sum_{i=1}^w x}, which equals {D_{SK}(\prod_{i=1}^w E_{PK}(x))}.

To illustrate the additively homomorphic property, we give the following example: let Alice has {x_1} and {x_2} and sends {E_{PK}(x_1)}, {E_{PK}(x_2)} to Bob, who is responsible for computing {a}:

\displaystyle  a = \begin{cases} 1 & \text{if } w_1 x_1 + w_2 x_2 < V\\ 0 & \text{otherwise} \end{cases}

If Alice used public-key cryptosystem that has the additively homomorphic property, then Bob can leverage Theorem~1 and compute the following {E_{PK}(w_1 x_1 + w_2 x_2) = E_{PK}(x_1)^{w_1} E_{PK}(x_2)^{w_2}}

An example of additively homomorphic cryptosystem is Paillier cryptosystem.

Paillier cryptosystem: Choose two large prime numbers {p} and {q}, set {n = pq} and select randomly {g \in \mathbb Z_{n^2}^*}. Let {\lambda(n) = lcm(p-1, q-1)} (known as Carmichael‘s function). Let function {L} denotes the function {\{L: \mathbb Z_{n^2}^* \rightarrow \mathbb Z_{n}^*, L(a) = \frac{a-1}{n} \}}. Then, {(n,g)} is considered as public key and {(p,q)} or equivalently {\lambda} is the private key.

  • Encryption: given plaintext {m < n}, select a random number {r < n}, and compute ciphertext {c = g^m r^n \, mod \, n^2}.
  • Decryption: given ciphertext {c < n^2}, compute plaintext {m = \frac{L(c^\lambda \, mod \, n^2)}{L(g^\lambda \, mod \, n^2)}}.

In order to follow the decryption step, we need to mention Carmichael‘s theorem and a theorem from the Composite Residuosity framework proposed by Paillier. We refer the reader to the original paper by Paillier for the proof of Composite Residuosity Theorem.

Theorem 2 Carmichael‘s theorem: Given any {w \in \mathbb Z_{n^2}^*} {\begin{cases} w^\lambda = 1 & mod \, n\\ w^{n \lambda} = 1 & mod \, n^2 \end{cases}}

Theorem 3 For any {w \in \mathbb Z_{n^2}^*}, there exists a unique pair {(a,b)} in the set {\mathbb Z_{n} \times \mathbb Z_{n}^*} such that {w = (1+n)^a b^n \, mod \, n^2}, then {w^\lambda = (1+n)^{a\lambda}b^{n\lambda} = (1+n)^{a\lambda} \, mod \, n^2}

Proof: Now, we can prove the decryption step in the Paillier cryptosystem:

  1. {c^\lambda \, mod \, n^2 = (g^m r^n)^\lambda \, mod \, = g^{m\lambda} r^{n\lambda} \, mod \, n^2}, (using Carmichael‘s theorem) { \rightarrow g^{m\lambda}\, mod \, n^2}.
  2. Leverage Theorem~3 and the relationship that {(1+n)^x = 1+xn \, mod \, n^2}, then {g^{\lambda m} \, mod \, n^2 = (1+a\lambda m n) \, mod \, n^2}.

From last two steps, we can say that {\frac{L(c^\lambda \, mod \, n^2)}{L(g^\lambda \, mod \, n^2)} = \frac{L(g^{m \lambda }\, mod \, n^2)}{L(g^\lambda \, mod \, n^2)} = \frac{L(1+a\lambda m n)}{L(1+a\lambda n)} = \frac{a \lambda m}{a\lambda} = m} \Box

After introducing the Paillier cryptosystem, we can now prove its additively homomorphic property:

Proof: Given plaintexts {m_1} and {m_2}:
{E_{PK}(m_1)E_{PK}(m_2) = (g^{m_1}r_1^n)(g^{m_2}r_2^n) \, mod \, n ^2 = g^{m_1+m_2} (r_1.r_2)^n \, mod \, n ^2 = E_{PK}(m_1+m_2)} \Box

Important remark: It is clear that random {r} in {E_{PK}(m_1+m_2)} doesn’t have to be equal to {r_1+r_2}. Even though the proof is correct, as in the decryption step we don’t need r. We can see in step {1} of the decryption proof, that we substitute the term {r^{n\lambda} \, mod \, n^2 } by {1} using the Carmichael‘s theorem. The value {r} is only used to deceive the eavesdropper, so even if Alice generates the same plaintext, the corresponding ciphertext is different.

As, we can see additively homomorphic encryption works only for functions include adding encrypted data or multiplying encrypted data by plaintext. However, if we need to compute the multiplication of two messages while we have access only to their encrypted values, then we need other schemes like multiplicatively homomorphic encryption or secure multiparty computation.

2.3. Multiplicatively Homomorphic Encryption

Theorem 4 Given plaintext {x_1} and {x_2}, a public-key cryptosystem is multiplicatively homomorphic if {D_{SK}(E_{PK}(x_1).E_{PK}(x_2)) = x_1 x_2}

The RSA cryptosystem and ElGammal cryptosystem have multiplicatively homomorphic property.

RSA cryptosystem: Choose two large prime numbers {p} and {q}, set {n = pq} and set {\phi(n) = (1-p)(1-q)}. Choose {e} such that {gcd(e,\phi(n)) = 1}. Find {d} such that {e.d \equiv 1 \, mod \, \phi(n)}

  • Encryption: given plaintext {m < n}, compute ciphertext {c = m^e \, mod \, n}.
  • Decryption: given ciphertext {c < n}, compute plaintext {m = c^d \, mod \, n}.

Now, we can prove that RSA cryptosystem has multiplicatively homomorphic property:

Proof: Given plaintexts {m_1} and {m_2}:
{E_{PK}(m_1)E_{PK}(m_2) = m_1^{e}m_2^{e} \, mod \, n = (m_1.m_2)^e \, mod \, n = E_{PK}(m_1.m_2)} \Box

3. Secure Multiparty Computation

In this section, we will cover another research line targeting the privacy preserving signal processing problem. Secure Multiparty Computation requires the involvement of more than one party in the computation compared to the homomorphic encryption where only one entity is responsible for the computation.

3.1. Arithmetic Two Party Computation

In this scheme, Alice and Bob wants to do arithmetic operation over encrypted data from one party and plaintext from the other party. For example, given {E_{PK}(x)} from Alice and Bob wants to compare {x} to {y}. This operation is performed frequently in signal processing where the detection server filters (e.g. linear weighting as described before) the readings coming from certain sensor and compares the filtered signal with some threshold. Based on the comparison, the detection server declares whether the event has been happened or not.

In this scheme, Bob wants to detect if {sign(x-y) > 0}, then {x > y}, and otherwise {x < y}. The scheme works as follows: let { 0 \leq x,y< 2^n}, then Bob can compute {E_{PK}(z)}:

\displaystyle  E_{PK}(z) = E_{PK}(2^n + x-y)

{z} is {(n+1)}-bit positive number. If {x-y < 0}, then {z_n} (most significant bit of {z}) equals to zero while it equals to {1} otherwise. Bob can compute {E_{PK}(z_n)}, if he knows the encrypted value of {z \, mod \, 2 ^n}:

\displaystyle  z_n = 2^{-n}(z - (z \, mod \, 2^n))

If {(x-y) < 0}, then {(z \, mod \, 2^n) = z}, then {z_n = 0}. On the other hand, {(x-y) > 0}, then {z \, mod \, 2^n = (x-y)}, then {z_n = 1}. Bob needs to know {E_{PK}(z \, mod \, 2 ^n)} from Alice along with {E_{PK}(z)}, to estimate {E_{PK}(z_n)} which equals { (E_{PK}(z)E_{PK}(z \, mod \, 2 ^n)^{-1})^{2^{-n}}} leveraging the additively homomorphic property.

3.2. Yao’s Garbled Circuits

In this protocol, the goal is to compute the output of a function while the inputs are kept private. Yao’s garbled circuits protocol converts any function into a function that can be evaluated securely using logic circuit. Each input and output of each gate in the circuit is being encrypted to prevent the party executing the function from knowing any information about the inputs and outputs of each gate. Let’s explain the protocol using simple example.

Let Alice has 1 bit variable x and Bob has 1 bit variable y, and they want to find value {f = x \; OR \; y}.

  1. Alice generates the truth table of the circuit, in this example it is only one {OR} gate. We have two inputs and one output.
    x y f
    0 0 0
    0 1 1
    1 0 1
    1 1 1
  2. Alice then generates 2 keys for each input or output, for {x: (K_{x_0},K_{x_1})}, for {y: (K_{y_0},K_{y_0})} and for {f: (K_{f_0},K_{f_1})}. In this protocol, symmetric keys are being used which make it much easier to encrypt and decrypt in terms of complexity compared to public-key encryption.
  3. Alice encrypts the output of the gate using the two keys corresponding to the inputs values. So, {e_{ij} = E_{k_{x_i}}(E_{k_{y_j}}(f_{ij}))}
    x y f
    0 0 {E_{k_{x_0}}(E_{k_{y_0}}(0))}
    0 1 {E_{k_{x_0}}(E_{k_{y_1}}(1))}
    1 0 {E_{k_{x_1}}(E_{k_{y_0}}(1))}
    1 1 {E_{k_{x_1}}(E_{k_{y_1}}(1))}
  4. Alice randomizes the entries of the encrypted truth table, so Bob will not be able to find relation between entry index and the input-output values.
  5. Alice then sends Bob her {K_{x_i}} corresponding to the value {i} of x and the garbled circuit.
  6. Bob needs also the key corresponding to the value of y, however, he cannot reveal the value of y to Alice. On the other hand, Alice cannot just send him the two keys corresponding to y, as in this case Bob can estimate the value of Alice’s x.
  7. Bob and Alice execute a protocol called Oblivious Transfer (OT), so Bob gets the required key without revealing the value of y.
  8. Bob then uses both keys {K_{x_i}} and {K_{y_j}}, to decrypt, the only entry he can, that corresponds to {f_{ij}}.

Oblivious Transfer Protocol:

As mentioned before, Alice has two strings, let say, {s_0} and {s_1}, and Bob picks {i \in \{0,1\}} and wants to get from Alice {s_i} without telling Alice the value of {i} nor knowing the other value {s_{i'}}. The OT protocol works as follows:

  1. Bob generates pair of public-private keys {K^{PUB}}, {K^{PRI}} and another key {K^\#}. Bob doesn’t know the corresponding private key of {K^\#} and it shouldn’t be distinguishable from {K^{PUB}}.
  2. Bob sends to Alice two public keys {K_0^{PUB}} and {K_1^{PUB}}, picking {K_i^{PUB} = K^{PUB}}, while {K_{i'}^{PUB} = K^{\#}}.
  3. Alice computes ciphertext {c_0 = E_{K_0^{PUB}}(s_0)} and {c_1 = E_{K_1^{PUB}}(s_1)} and sends them to Bob
  4. Bob computes {s_i = D_{K^{PRI}}(c_i)}

4. Conclusion

In this post we covered two main cryptography solutions for preserving privacy for cloud based signal processing. Homomorphic encryption which gives the advantage for the server to compute the required function on users’ encrypted data without the need to include the users in the computation procedure. However, it requires public-key cryptosystem which has certain homomorphic properties. Public-key cryptosystems themselves require in general heavy computations in encryption and decryption, as we saw, for example exponentiation of large numbers. On the other hand, secure multiparty computation, facilitates the computation of any function using encrypted inputs while using symmetric keys which require less computations and memory than public-key encryption. Nevertheless, transferring complex functions will lead to large logic circuits which will require huge data to be communicated, specially for the oblivious transfer protocol.

For the TV Detection application, we believe the Paillier cryptosystem with its additively homomorphic property is suitable for our application. Of course, the enycrption and decryption for Paillier cryptosystem are computationally demanding tasks, however, multiple solutions can be employed to resolve this issue. First, precomputing part of the protocol that is not depending on the user input, can decrease the running time of the protocol. For example, large primes generation, random number generation and exponentiation of the generated random number can be executed in offline phase. Second, packing the data needed to be encrypted before sending it to the server could also improve the running time significantly. There will be a trade off, that has to be studied, between the compression ratio and the accuracy of the detection system. Finally, computing the encryption and decryption of users’ data using their local desktops or laptops could be a suitable solution instead of using their limited resources mobile phones.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s