Jun 10, 2020

Decoding Machine Learning

Machine learning just got interactive. For the longest time, automated machine learning has confounded the people who use it. Not only does it seem incomprehensible at times, but the room for error that we can’t see until it’s too late is a realistic fear for many because the selection techniques of automated machine learning are deliberately hidden from users. MIT compares this kind of operation to a “black box” because our view of what's on the inside is obstructed much like our view into a black box would be.

All anyone really wants is to understand just how machine learning works. That will bring it down a notch from its enigma status. After all, how can we trust something that we don’t understand? We might be in luck. Recently, researchers at MIT collaborated with other developers around the world to create a new tool that allows users to see and control how their automated machine learning systems work. With the help of this new interactive tool called ATMSeer, the goal is that we can all have a little more confidence for the automated machine learning systems we use--and hopefully find ways to improve them as well.

Automated Machine Learning vs. Traditional Machine Learning

Designing a traditional machine learning model is a time-consuming, labor-intensive process. Typically, these models are designed for the sole purpose of performing a specific task. Real-world tasks would be things like classifying images, analyzing data of symptoms in order to make a diagnosis of a disease, or predicting trends in the stock market.

Traditionally, experts on building machine learning systems first choose one algorithm out of many, which they can later build their model around. After that, they have to manually adjust the hyperparameters that determine the model’s overall structure. Once these have been adjusted, the model is formed and can finally undergo its training.

Automated machine learning is a newer, much less laborious model. Rather than having to manually set up each individual process, the people who put the system in place just need to set up the system with the proper algorithm and hyperparameters, and from there the system does all of the testing and modification of the algorithm and hyperparameters on its own.

Automated machine learning can be advantageous in this way, because the testing and modification of the model by a computer reduces the chances of human error. It’s impossible, however, to actually see how the automated machine learning model carries out its processes. This is where ATMSeer comes in.

What ATMSeer Is

ATMSeer actually has its own customized automated machine learning system, called “Auto-Tuned Models” (hence, ATM) at its core. According to its developers, this new tool puts any control and analyses of automated machine learning processes directly into the hands of the user.

It takes input in the form of an automated machine learning system, a dataset, and some of the relevant information about the task a user is trying to accomplish. Using these, it then visualizes the automated machine learning’s searching process with the help of a user-friendly interface that can present in-depth information about the model’s performance.

How ATM Works

Traditional automated machine learning is selective. ATM, however, fully catalogues all of the search results while it tries to fit pre-existing learning models together with relevant data. This ensures that, even if some of the results aren’t included in groupings of relevant data, they’ll still be documented somewhere the user can still access it, rather than just discarded.

To tune the model, then, ATM takes its inputs and randomly selects an algorithm class and the model's hyperparameters. The algorithm class is essentially the method by which the system sorts through data. MIT cites neural networks and decision trees as some of the more common ones.

Hyperparameters help to determine the overall structure of the system. They help to fine-tune the system’s algorithm--to determine the specifics of that algorithm class. So if the algorithm class is a neural network or a decision tree, the hyperparameters would be something like how many layers there are in a neural network or the size of a decision tree.

Once these have all been set, the automated machine learning system runs the functional model against its dataset input, repetitiously tunes the hyperparameters again and again until the most effective algorithm class and hyperparameters are aligned, and measures the system performance.

After this, it uses everything it has learned about that model’s performance to select another model. The process repeats itself until the system is able to put out several of the top-performing models to complete the task the user is trying to accomplish.

To simplify this process, each individual model is essentially one data point with a few variables in the forms of the algorithm that runs the model, the hyperparameters that decide how the algorithm works, and the actual performance of the system.

The Creation of ATMSeer

MIT researchers took all of the work and knowledge that went into the building of Auto-Tuned Models to develop ATMSeer. Using the previous work, they then designed a system that actually plots data points and variables onto designated charts and graphs. For the first time, ATM makes automated machine learning not only accessible to the user, but also visible in a way that they can comprehend.

To make the visualization process even more efficient, they then developed a technique--which is separate from the model but which can be applied to it--that can reconfigure the data as it changes within the automated machine learning system in-real-time. What makes ATMSeer particularly special in this case is that when you use it, it allows for the modification of anything you can possibly visualize.

There are similar tools for visualization that are specifically catered to analyze only a single machine learning model and only allow for a very limited customization of the search space. Because of this, however, they can’t offer much support for the automated machine learning process--which is what makes ATMSeer so invaluable.

Where a visualization tool tailored for only one model can only analyze one model, ATMSeer is compatible for the analysis of a variety of machine learning models that are generated with several different algorithms. It is the perfect support for the automated machine learning process, which requires the analysis of a number of search model configurations.

How ATMSeer Builds Control and Confidence in the Model for the User

When a user looks at the interface for ATMSeer, they’ll see that it’s divided into three separate parts: a control panel, an overview panel, and a leaderboard containing the top-performing models. The control panel allows users to upload their datasets and chosen automated machine learning model, as well as to start and pause the searching process whenever they need to. The overview panel shows the basic statistics of the algorithms and hyperparameters.

When users see all of this and are able to interact with it accordingly, it helps to build their confidence in the technology they’re using. They’re able to view the exact steps that their automated machine learning is going through to sift through their data, and if they see a bug in that process, they can go directly into their algorithm and hyperparameters to make adjustments as needed.

Testing ATMSeer

Researchers and developers set up studies in which they had people who might actually use ATMSeer in their daily professional lives but weren’t yet familiar with automated machine learning to test out the product.

They gathered together a group of graduate science students who were complete novices to automated machine learning and set them up with the ATMSeer interface. At the end of the study, findings showed that around 85% of the people who used ATMSeer were confident in the models that the tool picked out. Nearly all of the participants said that the tool made them feel comfortable enough with automated machine learning for them to work with it in the future.

Further case studies with machine learning experts who had no experience with automated machine learning found that user control through the interface can actually help to better the performance and efficiency of the automated machine learning.

Other results showed that factors such as the number of algorithms that were searched, the system runtime, and the ability of users to find the top-performing model determined the customization that individual users applied to their automated machine learning searches. Having this information means that developers can further cater ATMSeer to the individual’s needs in the future.

The Rundown

The design of manual machine learning is an arduous process. Automated machine learning, which is a more recent development, helps to alleviate some of the growing pains of manual design and maintenance of models, but it’s still not perfect. Automated machine learning models are designed specifically to function entirely on their own--which can shut out the user. The creation of the ATMSeer tool, however, might be the perfect collaborative tool for machine learning experts to use but still lighten their workloads.

Approved by

Joey Rahimi