OKRs for Machine Learning: Ethics, Morals, and Safety

While OKRs are amazing tools to accomplish goals, leadership should be careful to avoid setting goals that promote unethical behavior or cause negative externalities. This is particularly true with machine learning, a young field with progress and outcomes driven by trial and error.

Machine Learning Ethics

Modern technology has a global reach with massive network effects which can amplify negative outcomes. As such, companies releasing products or services are held responsible by their governments to protect consumers. While many industries have been around for decades and already have regulations in place, machine learning is still a new field that’s under development. This makes it extremely important to be prudent when releasing ML products on the market.

Companies might get too caught up with their goals that they stop thinking about negative outcomes. Considering all the hype surrounding machine learning, many leaders might want to implement ML into their business as soon as possible. However, leadership should understand that ML is different from traditional software. ML models have a hidden layer that makes the outcome unpredictable. Some of these outcomes might cause harm or discrimination.

Some harmful examples include: autonomous driving systems which may put lives at risk. Cancer detection systems that produces a false negative analysis. Less harmful examples include freezing a bank account because of a wrongful detection of fraud, or denying a credit card application based on a wrongly calculated risk factor.

The Black Box Problem

One problem with machine learning is that the models act as a “black box”. Models are fed with data, hidden parameters get updated (“learn”) and an output is provided. While engineers can curate input data and evaluate output data, they cannot easily evaluate what’s going on inside the model. As such, they cannot predict every outcome the model will provide. Some of these outcomes might cause harm, and it is the company’s duty to protect consumers from these outcomes and update their models accordingly.

For example, social media platforms use deep learning models to increase user retention. They don’t always know “why” certain features cause people to stay longer, because that information is hidden inside the black box. While that is technically irrelevant for their bottom line (increasing advertising revenue), the company is still responsible for negative outcomes. Social studies have found that spending too much time on social media increases anxiety, depression and identity issues. If the company disregards this outcome and continues to maximize earnings at the expense of users' mental health, they will eventually be held accountable.

Bias and Misrepresentation

Another problem with machine learning is that output is based on sample input data, which can reflect human biases or misrepresent the population. In other words, racism or discrimination can be hidden in data sets because the data was created by humans with such tendencies in the first place. One famous example is AI programs that helps identify good candidates, but in fact discriminates against some genders, races or backgrounds.

Since machine learning is such a new field, companies are still struggling to create techniques that counter biased data. New cases are propping up weekly that were not anticipated and need to be investigated. Be aware that your organization holds the responsibility for a model’s output and you will need to invest resources in correcting and compensating for harmful outcomes. Some companies, like Google and Facebook, even have a dedicated “AI Ethics” team, that actively work on fairness, inclusion, safety and transparency on their ML-powered platforms.

How to Avoid Negative Outcomes in ML

Biased outcomes can be prevented and corrected with better data processing and continuous monitoring after deployment. Data sets can be tested and corrected for bais before using it as training data. And ML tools post-deployment should have a short feedback loop where users can quickly report unethical or discriminatory results (which is why Google Translate has a feedback button: to improve results and flag negative outcomes).

Another powerful solution is using machine learning analyses combined with human judgment. When PayPal started using ML to detect fraudulent transactions, they noticed quite a few errors that made their system unreliable. Conversely, humans didn’t have capacity to process billions of transactions manually. But when they paired human agents to review transactions flagged by the model, they noticed that accuracy improved dramatically.

Machine learning models should undergo a rigorous selection process before being deployed in the real world. If the model performance does not comply with standards, it should be discarded. That’s one of the reasons why most ML models never see production. While this may seem like an expensive trade-off, the negative consequences of harmful output can be far costlier.

Back to Blog