summary: Researchers have developed a new approach to improve uncertainty estimation and increase predictive accuracy in machine learning models. Their method, IF-COMP, uses the minimum description length principle to provide more reliable confidence metrics for important AI decisions in critical contexts like healthcare.
This scalable technique can be applied to large models and helps non-experts judge the reliability of AI predictions — findings that can lead to better decision-making in real-world applications.
Key Facts:
- Increased accuracy: IF-COMP improves the estimation of uncertainty in AI predictions.
- Scalability: It can be applied to large, complex models in critical environments such as healthcare.
- easy to use: It helps non-experts assess the reliability of AI decisions.
sauce: Massachusetts Institute of Technology
Because machine learning models can make incorrect predictions, researchers often give their models the ability to tell users how confident they are in a particular decision. This is especially important in high-stakes situations, such as when a model is used to identify diseases in medical images or filter job applications.
But quantifying a model’s uncertainty is only useful if it is accurate: if a model says there is a 49% chance that pleural effusion will be seen in a medical image, then the model should be correct 49% of the time.
Researchers at MIT have introduced a new approach that can improve uncertainty estimation in machine learning models. Not only does the technique produce more accurate uncertainty estimates than other techniques, it does so more efficiently.
What’s more, the technique is scalable, making it applicable to the large-scale deep learning models being increasingly deployed in healthcare and other safety-critical situations.
This technique has the potential to provide end-users without machine learning expertise with better information that they can use to decide whether to trust a model’s predictions or whether they should deploy the model for a particular task.
“We see that these models perform extremely well in some very good scenarios, and it’s easy to assume they’ll perform just as well in other scenarios.
“This makes it especially important to advance this kind of research to better calibrate the uncertainty in these models so that they are consistent with human notions of uncertainty,” says lead author Nathan Ng, a visiting student at MIT and a graduate student at the University of Toronto.
Ng co-authored the paper with Roger Gross, assistant professor in the University of Toronto’s Department of Computer Science, and lead author Marji Ghassemi, associate professor in the Department of Electrical Engineering and Computer Science and a member of the Biomedical Engineering Institute and the Information and Decision Systems Laboratory. The research will be presented at the International Conference on Machine Learning.
Quantifying Uncertainty
Uncertainty quantification methods often require complex statistical calculations that do not scale well to machine learning models with millions of parameters, and they require users to make assumptions about the model and the data used to train it.
The MIT researchers took a different approach. They use something called the minimum description length principle (MDL), which doesn’t require assumptions that can hinder the accuracy of other methods. MDL is used to better quantify and accommodate the uncertainty in the test points that a model is asked to label.
The technique the researchers developed, called IF-COMP, makes MDL fast enough to be used with large-scale deep learning models deployed in many real-world environments.
MDL considers all possible labels that the model could give to a test point, and if there are many alternative labels that fit this point well, the confidence in the selected label decreases accordingly.
“One way to understand how much trust a model has is to give it counterfactual information and see how likely it is to believe it,” Ng says.
For example, consider a model that is told that a medical image shows pleural effusion: if a researcher tells the model that the image shows edema, and the model is willing to update its belief, the model will become less confident in its original decision.
In MDL, when the model is confident in labeling a data point, it should use a very short code to describe that point. When the decision is uncertain because a point could be labeled with many other labels, it should use a longer code to capture these possibilities.
The amount of code used to label a data point is called probabilistic data complexity. If researchers ask a model how likely it is to update its beliefs about a data point in the presence of contrary evidence, the probabilistic data complexity should decrease if the model is confident.
However, testing each data point using MDL requires a significant amount of computation.
Speeding up the process
With IF-COMP, the researchers developed an approximation technique that can accurately estimate the complexity of stochastic data using a special function called an influence function. They also employed a statistical technique called temperature scaling that improves the calibration of the model’s output. The combination of the influence function and temperature scaling allows for a high-quality approximation of the complexity of stochastic data.
Ultimately, IF-COMP can efficiently generate a well-calibrated uncertainty quantification that reflects the true reliability of the model. The technique can also determine if the model has mislabeled certain data points and reveal which data points are outliers.
The researchers tested their system on these three tasks and found it to be faster and more accurate than other methods.
“Having confidence that models are properly tuned is crucial, and there is an increasing need to detect when certain predictions seem incorrect. Audit tools are increasingly necessary for machine learning problems, as we use large amounts of unvalidated data to create models that are then applied to human-facing problems,” Ghassemi says.
Because IF-COMP is model-agnostic, it can provide accurate uncertainty quantification for many types of machine learning models, allowing them to be deployed in a wider range of real-world environments, ultimately empowering more experts to make better decisions.
“People need to understand that these systems are highly fallible and can fabricate facts on the spot. The models may seem very confident, but there are a lot of things they are willing to believe if there is evidence to the contrary,” Ng said.
In the future, the researchers are interested in applying this approach to large-scale language models and exploring other potential use cases for the minimum description length principle.
About this AI research news
author: Melanie Grados
sauce: Massachusetts Institute of Technology
contact: Melanie Grados – MIT
image: Image courtesy of Neuroscience News
Original Research: The access is closed.
“Measuring the complexity of stochastic data with the Boltzmann influence function.Roger Gross et al. arXiv
Abstract
Measuring the complexity of stochastic data with the Boltzmann influence function.
Estimating the uncertainty of a model’s predictions at test points is an important part of ensuring reliability and calibration under changing distributions.
The minimum description length approach to this problem uses a predictive normalized maximum likelihood (pNML) distribution, which considers all possible labels for a data point and reduces the confidence of the prediction if other labels also match the model and training data.
In this work, we propose IF-COMP, a scalable and efficient approximation to the pNML distribution that linearizes the model using a temperature-scaled Boltzmann influence function. IF-COMP allows us to generate well-tuned predictions on test points as well as measure complexity in both labeled and unlabeled settings.
We experimentally validate IF-COMP on uncertainty calibration, mislabel detection, and OOD detection tasks, consistently matching or outperforming strong baseline methods.