Machine learning is one of the hottest disciplines in computer science today. So hot, in fact, that cloud providers are doing a good and rapidly growing business in machine-learning-as-a-service (MLaaS).
But these services come with a caveat: all the training data must be revealed to the service operator. Even if the service operator does not intentionally access the data, someone with nefarious motives may. Or their may be legal reasons to preserve privacy, such as with health data.
In a recent paper, Chiron: Privacy-preserving Machine Learning as a Service[1] Tyler Hunt, of the University of Texas, and others, presents a system that preserves privacy while enabling the use of cloud MLaaS.
Privacy cuts both ways
While users may not wish to reveal their training data, the service providers have privacy concerns of their own. They typically do not allow customers to see the algorithms under their MLaaS technology.
To that end,
. . . Chiron conceals the training data from the service operator. [And] in keeping with how many existing ML-as-a-service platforms work, Chiron reveals neither the training algorithm nor the model structure to the user, providing only black-box access to the trained model.
Chiron uses Intel's Software Guard Extensions (SGX) secure enclaves, an architecture designed to increase the security of application code. But SGX alone isn't enough. Chiron also uses the SGX platform for Ryoan sandbox[2], a distributed, protected sandbox that secures untrusted user code from malicious infrastructure, such as you might find in the cloud.
Threat model
Chiron's goal is to protect the user's training data, as well as trained model queries and outputs, while in the cloud. To that end:
We assume that the entire platform is