Insights
Blogs

Federated Machine Learning for Data Privacy Preservation

Introduction

With the explosion of data generation (thanks to a potent combination of internet accessibility and the proliferation of mobile devices, coupled with accessible serverless, on-demand compute power), machine learning is basking in the light of its zeitgeist moment, even as use cases powered by it continue to touch multiple aspects of our lives. Yet, while the march is relentless, its growing maturity has led to the cognizance of ancillary but critical concerns; one such concern is data privacy preservation. This is where federated machine learning enters the picture.

How is federated machine learning different from traditional machine learning?

Federated machine learning is conceptualized because it differs from traditional machine learning in that data privacy preservation is part of the design.

Traditional machine learning works in the following manner: data generated from multiple users (say, viewing habits on YouTube, for example) are collected on a central server. These data are then used to train a machine learning model (say, recommending videos to users, for example). The model is then rolled out to users centrally, where ongoing, and said model consumes past data generated to generate its output (recommended videos in this example).

In this premise, the concern is that since the data of individual users are stored on a central server, the potential exists to misuse this data by a rogue agent. This invites the question: can the same process be designed and implemented so that the threat for data privacy can be addressed?

Federated machine learning solves the issue by implementing the earlier process in the following manner: the standard model training packet (model goal, structure, parameters, hyperparameters, etc.) is sent to the device of all users through a central server. This model packet then runs locally on each user device, using the edge compute capability of said device. Thus, multiple user-specific local models are trained privately using only said users' data. These local models are then relayed back to the central server, a global model reconstructed from the individual user models. The global model is rolled out again to user devices via said central server.

How does federated machine learning preserve data privacy preservation?

The way federated machine learning ensures data privacy preservation hinges on the following key detail. Instead of relaying user-specific data to a central server, a user-specific model is dispatched (trained locally on user devices using edge computational power). It decentralizes the model training process, thus bypassing the need to collect all user data at one central server. Thus, user-specific models are potentially exposed to rogue agents rather than user-specific data. The possibility of misuse of a user-specific model is much lower than that of user-specific data since it is an abstract artifact.

Conclusion

An excellent real-life example of federated machine learning is the Google Gboard keyboard app. This keyboard app, available for Android and Apple devices, uses the sentence patterns of users to offer predictive suggestions. However, it uses a federated machine learning approach instead of pooling the data centrally and training the model. As a result, everyone can use the common typing patterns of languages without having their specific patterns and data vulnerable to misuse.

With data privacy gaining increased attention (primarily regulatory), it is expected that federated machine learning implementations will be common, especially for consumer-facing use cases.