What is AutoML?

AutoML stands for Automated Machine Learning and refers to the process by which artificial intelligence systems are built with limited human intervention.

The traditional way of creating AI software was to have a data scientist manually prepare a dataset, select a model type and architecture, and tune its hyperparameters (those parameters that the model cannot learn on its own) before training the model. This process can be slow, prone to error, and may likely produce suboptimal results due to complexity, time constraints and manual intervention. However, with AutoML, an automated system takes care of selecting and tuning the algorithm, improving data scientists' productivity and yielding better results.

How can companies benefit from AutoML?

The use of AutoML platforms is currently viewed as necessary both in industry and academia. In industry, companies use AutoML to improve trustworthiness, scalability, and cost-effectiveness to their solutions for managing machine learning models in production.

Without AutoML, companies must invest significant resources to achieve the best prediction accuracy for their artificial intelligence models. Maintaining such accuracy also requires monitoring and effort, as artificial intelligence models degrade over time (an effect called data drifting). AutoML automates these tasks, ensuring companies reach and maintain the best results.

Additionally, academics also use AutoML to compare new model architectures and methodologies to properly optimized baselines.

Without AutoML, the question “Could I have gotten better results?” continues to linger

How does AutoML work?

There are several steps that must be performed to generate reliable inferences from the data. These steps may be summarized as follows (with variations in terms of additional steps and order):


  1. Data preparation: Align and join related datasets. Impute missing values if necessary.
  2. Dataset analysis: Obtain descriptive statistics from the dataset. These figures may be used to propose model candidates.
  3. Model proposals: Identify candidate model architectures best suited to the properties of the dataset.
  4. Feature engineering: Enrich and encode dataset. This makes it easier for a model to learn relevant relationships and allows the algorithm to process the data.
  5. Candidate tuning: Optimize hyper-parameters of model candidates.
  6. Candidate validation: Test the model candidates against slices of data not used for training.
  7. Deploy model: Select the best model and use it to generate predictions.

How does AutoML compare to MLOps?

MLOps refers to the set of practices required to deploy and maintain machine learning models in production. As such, it is mostly concerned with model serving and monitoring. AutoML, in contrast, is an automated system tasked with generating a top performing ML model from the raw data.

Most AutoML platforms also follow MLOps best practices, automating both the creation of the ML model and its deployment in a production environment.