Maggy User API

maggy.experiment module

Experiment module used for running asynchronous optimization tasks.

The programming model is that you wrap the code containing the model training inside a wrapper function. Inside that wrapper function provide all imports and parts that make up your experiment, see examples below. Whenever a function to run an experiment is invoked it is also registered in the Experiments service along with the provided information.

maggy.experiment.lagom(map_fun, name='no-name', experiment_type='optimization', hb_interval=1, num_trials=1, searchspace=None, optimizer=None, direction='max', ablation_study=None, ablator=None, es_policy='median', es_interval=300, es_min=10, description='')

Launches a maggy experiment, which depending on experiment_type can either be a hyperparameter optimization or an ablation study experiment. Given a search space, objective and a model training procedure map_fun (black-box function), an experiment is the whole process of finding the best hyperparameter combination in the search space, optimizing the black-box function. Currently maggy supports random search and a median stopping rule.

lagom is a Swedish word meaning “just the right amount”.

Parameters:
  • map_fun (function) – User defined experiment containing the model training.
  • experiment_type (str) – Type of Maggy experiment, either ‘optimization’ (default) or ‘ablation’.
  • searchspace (Searchspace) – A maggy Searchspace object from which samples are drawn.
  • optimizer (str, AbstractOptimizer) – The optimizer is the part generating new trials.
  • direction (str) – If set to ‘max’ the highest value returned will correspond to the best solution, if set to ‘min’ the opposite is true.
  • num_trials (int) – the number of trials to evaluate given the search space, each containing a different hyperparameter combination
  • name (str) – A user defined experiment identifier.
  • hb_interval (int, optional) – The heartbeat interval in seconds from trial executor to experiment driver, defaults to 1
  • es_policy (str, optional) – The earlystopping policy, defaults to ‘median’
  • es_interval (int, optional) – Frequency interval in seconds to check currently running trials for early stopping, defaults to 300
  • es_min (int, optional) – Minimum number of trials finalized before checking for early stopping, defaults to 10
  • description (str, optional) – A longer description of the experiment.
Raises:

RuntimeError – An experiment is currently running.

Returns:

A dictionary indicating the best trial and best hyperparameter combination with it’s performance metric

Return type:

dict

maggy.searchspace module

class maggy.Searchspace(**kwargs)

Create an instance of Searchspace from keyword arguments.

The keyword arguments specify name-values pairs for the hyperparameters, where values are tuples of the form (type, list). Type is a string with one of the following values:

  • DOUBLE
  • INTEGER
  • DISCRETE
  • CATEGORICAL

And the list in the tuple specifies either two values only, the start and end point of of the feasible interval for DOUBLE and INTEGER, or the discrete possible values for the types DISCRETE and CATEGORICAL.

Sample usage:

>>> # Define Searchspace
>>> from maggy import Searchspace
>>> # The searchspace can be instantiated with parameters
>>> sp = Searchspace(kernel=('INTEGER', [2, 8]), pool=('INTEGER', [2, 8]))
>>> # Or additional parameters can be added one by one
>>> sp.add('dropout', ('DOUBLE', [0.01, 0.99]))

The Searchspace object can also be initialized from a python dictionary:

>>> sp_dict = sp.to_dict()
>>> sp_new = Searchspace(**sp_dict)

The parameter names are added as attributes of Searchspace object, so they can be accessed directly with the dot notation searchspace._name_.

add(name, value)

Adds {name, value} pair to hyperparameters.

Parameters:
  • name (str) – Name of the hyperparameter
  • value (tuple) – A tuple of the parameter type and its feasible region
Raises:
  • ValueError – Hyperparameter name is reserved
  • ValueError – Hyperparameter feasible region in wrong format
get(name, default=None)

Returns the value of name if it exists, else default.

get_random_parameter_values(num)

Generate random parameter dictionaries, e.g. to be used for initializing an optimizer.

Parameters:num (int) – number of random parameter dictionaries to be generated.
Raises:ValueErrornum is not an int.
Returns:a list containing parameter dictionaries
Return type:list
names()

Returns the dictionary with the names and types of all hyperparameters.

Returns:Dictionary of hyperparameter names, with types as value
Return type:dict
to_dict()

Return the hyperparameters as a Python dictionary.

Returns:A dictionary with hyperparameter names as keys. The values are the hyperparameter values.
Return type:dict

maggy.callbacks module

class maggy.callbacks.KerasBatchEnd(reporter, metric='loss')

A Keras callback reporting a specified metric at the end of the batch to the maggy experiment driver.

loss is always available as a metric, and optionally acc (if accuracy monitoring is enabled, that is, accuracy is added to keras model metrics). Validation metrics are not available for the BatchEnd callback. Validation after every batch would be too expensive. Default is training loss (loss).

Example usage:

>>> from maggy.callbacks import KerasBatchEnd
>>> callbacks = [KerasBatchEnd(reporter, metric='acc')]
class maggy.callbacks.KerasEpochEnd(reporter, metric='val_loss')

A Keras callback reporting a specified metric at the end of an epoch to the maggy experiment driver.

val_loss is always available as a metric, and optionally val_acc (if accuracy monitoring is enabled, that is, accuracy is added to keras model metrics). Training metrics are available under the names loss and acc. Default is validation loss (val_loss).

Example usage:

>>> from maggy.callbacks import KerasBatchEnd
>>> callbacks = [KerasBatchEnd(reporter, metric='val_acc')]

maggy.ablation module

class maggy.ablation.AblationStudy(training_dataset_name, training_dataset_version, label_name, **kwargs)

The AblationStudy object is the entry point to define an ablation study with maggy. This object can subsequently be passed as an argument when the experiment is launched with experiment.lagom().

Sample usage:

>>> from maggy.ablation import AblationStudy
>>> ablation_study = AblationStudy('titanic_train_dataset',
>>>     label_name='survived')

Define your study by including layers and features, which should be ablated:

>>> ablation_study.features.include('pclass', 'fare')
>>> ablation_study.model.layers.include('my_dense_two',
>>>     'my_dense_three')

You can also add a layer group using a list:

>>> ablation_study.model.layers.include_groups(['my_dense_two',
>>>     'my_dense_four'])

Or add a layer group using a prefix:

>>> ablation_study.model.layers.include_groups(prefix='my_dense')

Next you should define a base model function using the layer and feature names you previously specified:

>>> # you only need to add the `name` parameter to layer initializers
>>> def base_model_generator():
>>>     model = tf.keras.Sequential()
>>>     model.add(tf.keras.layers.Dense(64, activation='relu'))
>>>     model.add(tf.keras.layers.Dense(..., name='my_dense_two', ...)
>>>     model.add(tf.keras.layers.Dense(32, activation='relu'))
>>>     model.add(tf.keras.layers.Dense(..., name='my_dense_sigmoid', ...)
>>>     # output layer
>>>     model.add(tf.keras.layers.Dense(1, activation='linear'))
>>>     return model

Make sure to include the generator function in the study:

>>> ablation_study.model.set_base_model_generator(base_model_generator)

Last but not least you can define your actual training function:

>>> from maggy import experiment
>>> from maggy.callbacks import KerasBatchEnd

>>> def training_function(dataset_function, model_function, reporter):
>>>     import tensorflow as tf
>>>     epochs = 5
>>>     batch_size = 10
>>>     tf_dataset = dataset_function(epochs, batch_size)
>>>     model = model_function()
>>>     model.compile(optimizer=tf.train.AdamOptimizer(0.001),
>>>             loss='binary_crossentropy',
>>>             metrics=['accuracy'])
>>>     ### Maggy REPORTER
>>>     callbacks = [KerasBatchEnd(reporter, metric='acc')]
>>>     history = model.fit(tf_dataset, epochs=5, steps_per_epoch=30)
>>>     return float(history.history['acc'][-1])

Lagom the experiment:

>>> result = experiment.lagom(map_fun=training_function,
>>>                         experiment_type='ablation',
>>>                         ablation_study=ablation_study,
>>>                         ablator='loco',
>>>                         name='Titanic-LOCO',
>>>                         hb_interval=5)
__init__(training_dataset_name, training_dataset_version, label_name, **kwargs)

Initializes the ablation study.

Parameters:
  • training_dataset_name (str) – Name of the training dataset in the featurestore.
  • training_dataset_version (int) – Version of the training dataset to be used.
  • label_name (str) – Name of the target prediction label.
to_dict()

Returns the ablation study configuration as a Python dictionary.

Returns:A dictionary with ablation study configuration parameters as keys (i.e. ‘training_dataset_name’, ‘included_features’, etc.)
Return type:dict