Featuretools is a feature engineering system that automates the process. It excels at translating temporal and relational datasets into machine learning function matrices.
Compose is a prediction engineering tool that automates the process. It allows you to create labels for supervised learning and structure prediction problems.
We’ve seen how Featuretools and Compose make it easy for users to combine multiple tables into transformed and aggregated machine learning features, as well as identify time series supervised machine learning use-cases.
The next question was, “What happens next?” How can Featuretools and Compose users build machine learning models that are easy and flexible?
We’re excited to announce the addition of a new open-source project to the Alteryx ecosystem. EvalML is a Python library for automatic machine learning (AutoML) and model comprehension.
obtain features, a target and a problem type for that target
X, y = evalml.demos.load_breast_cancer()
problem_type = ‘binary’
X_train, X_test, y_train, y_test = evalml.preprocessing.split_data(
X, y, problem_type=problem_type, test_size=.2)
perform a search across multiple pipelines and hyperparameters
automl = AutoMLSearch(X=x, y=y, problem_type=problem_type)
the best pipeline is already refitted on the entire training data
best_pipeline = automl.best_pipeline
AutoML quest in action with EvalML
EvalML is a unified interface for creating machine learning models, as well as for using those models to produce insights and make accurate predictions. EvalML gives you access to a variety of modelling libraries through a single API. Regression, binary classification, and multiclass classification are among the supervised machine learning problem types provided by EvalML. Users may use custom objective functions to express their search for a model in terms of what they value. Above all, we wanted EvalML to be stable and performant, so we tested it with machine learning on every update.
What’s Cool about EvalML
- Simple Unified Modeling API
EvalML reduces the time and effort required to produce an accurate model, saving both time and complexity.
AutoML’s EvalML pipelines provide preprocessing and feature engineering measures right out of the package. EvalML’s AutoML will run a search algorithm to train and score a set of models, allowing users to pick one or more models from that collection, and then use those models for insight-driven analysis or prediction once they’ve found the target column of the data they’d like to model.
EvalML was developed to work with Featuretools, a tool that can combine data from multiple tables and generate features to help ML models run faster, and Compose, a tool for label engineering and time series aggregation. EvalML users can easily monitor how each inputted function, such as a numeric feature, a categorical feature, text, date-time, and so on, is handled by EvalML.
Compose and Feature tools can be used with EvalML to build machine learning models.
A pipeline data structure, made up of a graph of components, is used to describe EvalML models. Every AutoML operation on your data is documented in the pipeline. This makes switching from choosing a model to deploying one a breeze. Custom modules, pipelines, and goals are also simple to describe in EvalML, whether for use in AutoML or as stand-alone elements.
- Domain-Specific Objective Functions are functions that are specific to a domain.
EvalML helps you to construct custom objective functions that are specific to your data and domain. This enables you to express what makes a model important in your domain and then use AutoML to find models that deliver on that importance.
During and after the quest process, the custom goals are used to rate models on the AutoML leaderboard. Using a custom goal can aid the AutoML quest in locating the most impactful versions. AutoML can also use custom goals to fine-tune binary classification models’ classification thresholds.
Custom goals are described in detail in the EvalML documentation, along with examples of how to use them effectively.
- Understanding the Model
EvalML offers access to a wide range of models and tools for model comprehension. Feature and permutation value, partial dependency, precision-recall, confusion matrices, ROC curves, prediction explanations, and binary classifier threshold optimization are all supported at the moment.
An example of partial dependence from the EvalML documentation
- Data Checks
EvalML’s data checks can identify common data issues prior to modeling, preventing model quality issues, unexplained glitches, and stack traces. Current data tests provide a straightforward approach to detecting target leakage, in which the model is given access to information during training that will not be accessible at prediction time, detection of invalid datatypes, high-class imbalance, highly null columns, constant columns, and columns that are likely an ID and not useful for modeling, as well as detection of invalid datatypes, high-class imbalance, highly null columns, constant columns, and columns that are likely an ID and not useful for modeling.
Getting Started Using EvalML
Visit the documentation page for installation instructions as well as videos that explain how to use EvalML, a user guide that explains the components and core principles of EvalML, an API reference, and more. The source code for EvalML can be found at https://github.com/alteryx/evalml. Check out open-source slack to communicate with the team. We’re regularly contributing to the repository, and we’ll fix any problems you bring up.
Time series modelling, parallel evaluation of pipelines during AutoML, enhancements to the AutoML algorithm, new model types and preprocessing phases, tools for model debugging and model implementation, support for anomaly detection, and much more are all on the EvalML feature roadmap.
Do you want to learn more? Please take a moment to follow this site, star the repo on GitHub, and stay tuned for more features and content on the way if you’re interested in hearing about updates as the project progresses.