# model tuning with automl in auto-sklearn: Part 1

I went to a talk at Bristech on the subject of autoML in Azure. I have to say I was pretty convinced the autoML is the future. Azure’s autoML is built on sklearn (and therefore it’s a python-only SDK), so I did a big of googling and it turns out the autoML with sklearn is actually open-source. You just need to install the ‘auto’sklearn’ package onto your computer and you’re good to go.

A few provisos with installing auto-sklean.

1. You need a linux environment to run it. My laptop runs Ubuntu, so I didn’t need to do anything but if you’re not on linux then you’ll need a VM (or a container).
2. The dependencies might clash with other versions of packages you have installed. E.g. I had a different version of pandas or numpy or something and, although I went through the auto-sklearn intallation step detailed here (yes, auto-sklearn has a very useful website) it wouldn’t run. The way to solve this is to create a fresh environment (I use conda) with
conda create -n automl python=3.6
conda activate automl

and then follow the installation instructions from the command line and you’ll get an environment excactly as auto-sklearn wants it. Once that’s done, start a python session from within your automl environment and run the auto-sklearn example.

As a side point, since I’m actually running this example from within an rmarkdown document, and the python engine for rmakdown is provided by the reticulate package, I’ll tell reticulate with conda env to use to the run the python code.

library(reticulate)
use_condaenv(condaenv = "automl")

import autosklearn.classification
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
## [WARNING] [2019-03-17 18:25:11,588:smac.intensification.intensification.Intensifier] Challenger was the same as the current incumbent; Skipping challenger
## [WARNING] [2019-03-17 18:25:11,588:smac.intensification.intensification.Intensifier] Challenger was the same as the current incumbent; Skipping challenger
## AutoSklearnClassifier(delete_output_folder_after_terminate=True,
##            delete_tmp_folder_after_terminate=True,
##            disable_evaluator_output=False, ensemble_memory_limit=1024,
##            ensemble_nbest=50, ensemble_size=50, exclude_estimators=None,
##            exclude_preprocessors=None, get_smac_object_callback=None,
##            include_estimators=None, include_preprocessors=None,
##            initial_configurations_via_metalearning=25, logging_config=None,
##            ml_memory_limit=3072, n_jobs=None, output_folder=None,
##            per_run_time_limit=360, resampling_strategy='holdout',
##            resampling_strategy_arguments=None, seed=1, shared_mode=False,
##            tmp_folder=None)
y_hat = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))
## Accuracy score 0.9866666666666667
filename = "automl_test_model.sav"
pickle.dump(automl, open(filename, 'wb'))