Introduction and implementation to keras-tuner
Choosing the right set of hyperparameters for your models
When it comes to deep learning, a critical thing to work with is the hyperparameters. To those of you who don’t know, hyperparameters are the variables that govern the structure of your neural network. It could be — but are not limited to — the number of layers, the number of neurons, the learning rate, or the number of epochs. Whenever you create a model in deep learning, the initial model is mostly not perfect unless you hit a home run, and then you optimize the model by tweaking the hyperparameters before you pick another one.
This tutorial is not going to help you to build TensorFlow models, instead will be a guide to using keras-tuner
on your existing models.
Let’s take a simple regression problem statement, and construct a very basic model with three fully-connected / dense layers, where you have 16, 32, and 1 unit(s) in the first, second, and the last layer respectively with relu
activations. Let’s keep the learning rate 0.001
for the Adam optimizer, and run it on 50 epochs. Below is the code for the same if you want to follow along.
Now let’s say your model didn’t perform very well, in fact, it was terrible, and you want to try different hyperparameters. Instead of having to manually tweak everything, we would just automate the process and leave everything up to keras-tuner
to give us the optimal combination.
Before we begin, you need to ensure that you’re working with TensorFlow v^2.0.0 since it has keras
inbuilt, otherwise, if you’re working on lower versions you might have to manually install it. Once you have that, you can install keras-tuner
.
$ pip install -U keras-tuner
We will be using the RandomSearch
class for selecting the best set of layers by sampling different combinations picked at random.
When you read the documentation you might come across a term called the hypermodel, a hypermodel is just a model that is built for tuning the hyperparameters. Let’s see how to build that.
We will be creating a LOG_DIR
to create a log of all the models that have been tried. In hypermodel.py
you may notice that we use hp.Int
in two types in the code and hp.Choice
once, let’s understand them bit by bit.
hp.Int
: This function selects values randomly between the range you provide it. If you observe the following code,
model.add(Dense(
hp.Int('input_units', min_value=12, max_value=256),
input_shape=trainFeatures.shape[1:])
)
you will notice that the first parameter is a string, in our case input_units
, that you can modify to display the desired string for the corresponding parameter, the min_value
and max_value
parameters set the range for the number of neurons/units in the Dense layer.
On similar lines, the next usage of hp.Int
is as follows
for i in range(hp.Int("n_layers", min_value=1, max_value=5)):
model.add(Dense(hp.Int(f"dense_{i}_units", 32, 256)))
model.add(Activation('relu'))
This means that you loop over the Dense and Activation layers to decide how many numbers of those layers you want, instead of input_units
here we have n_layers
as the string and the min_value
and max_value
are replaced by 1 and 5 meaning there could be any number of layers between 1 to 5.
hp.Choice
: This function selects values specified in the list you provide it. If you observe the following code,
optimizer=tf.keras.optimizers.Adam(
hp.Choice('lr',[0.001,0.01,0.002,0.2])
)
you will notice that the first parameter is again a string lr
but this time we have a list of values [0.001,0.01,0.002,0.02]
this means that the hypermodel will only pick values specified in this list.
Instead of hp.Choice
, you could also use hp.Float
for setting the learning rate, but I would let you explore that more.
Once we have built the hypermodel, we now have to instantiate it. There are four ways to do that RandomSearch
, Hyperband
, BayesianOptimization
, and Sklearn
. We will be using RandomSearch
to instantiate our tuner. You could do that as follows.
Let’s understand a little bit more about the trials parameter that we have specified while instantiating the tuner.
maximum_trials
: These are the number of random combinations that you want to try out, or more clearly, the number of models you want to create.executions_per_trial
: This builds the same modeln
number of times. You might want to set it to 3 just to average and ensure that you didn’t just hit a home run and that your model is actually performing well.objective
: This is basically the metric that you want to optimize, more clearly, the objective that you want to achieve out of the whole process.
Okay now that we have instantiated our tuner, we need to search for the best model, and for that, we need to create a tuner search object. This essentially requires the same parameters as model.fit
.
This is how your output would look like once you run the tuner-search
code block. It will display the best objective value along with the total time elapsed.
Now in case you want to reuse the models or analyze the best ones later, you can save and load the pickle objects as follows.
import pickle#Saves the tuner as pickle obje
with open(f'tuner_{int(time.time())}.pkl', 'wb') as f:
pickle.dump(tuner, f)#Loads the former object
tuner = pickle.load(open('tuner_1610218332.pkl','rb'))
Going back to our tuner, we’ll try to get some more information from that.
The tuner.results_summary()
will give you the top ten models, whereas tuner.get_best_models()[0].summary()
will give you the summary of the best model that would look exactly like the summary of a TensorFlow model.
References:
That’s it from my side. Thanks for reading! Hope you learned something new.