Big Brother AI: Machine learning is coming for your privacy
Machine learning has pushed the boundaries in several fields, including personalized medicine, self-driving cars and customized advertisements. Research has shown, however, that these systems memorize aspects of the data they were trained with in order to learn patterns, which raises concerns for privacy.
In statistics and machine learning, the goal is to learn from past data to make new predictions or inferences about future data. In order to achieve this goal, the statistician or machine learning expert selects a model to capture the suspected patterns in the data. A model applies a simplifying structure to the data, which makes it possible to learn patterns and make predictions.
Complex machine learning models have some inherent pros and cons. On the positive side, they can learn much more complex patterns and work with richer datasets for tasks such as image recognition and predicting how a specific person will respond to a treatment.
However, they also have the risk of overfitting to the data. This means that they make accurate predictions about the data they were trained with but start to learn additional aspects of the data that are not directly related to the task at hand. This leads to models that aren’t generalized, meaning they perform poorly on new data that is the same type but not exactly the same as the training data.
While there are techniques to address the predictive error associated with overfitting, there are also privacy concerns from being able to learn so much from the data.
How machine learning algorithms make inferences
Each model has a certain number of parameters. A parameter is an element of a model that can be changed. Each parameter has a value, or setting, that the model derives from the