edo package¶

Subpackages¶

Submodules¶

edo.family module¶

The distribution subtype handler.

class edo.family.Family(distribution, max_subtypes=None)[source]¶

Bases: object

A class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.

Parameters:

distribution : edo.distributions.Distribution: The distribution class to keep track of. Must be of the same form as those in edo.distributions.
max_subtypes : int: The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.

Attributes:

name : str: The name of the family’s distribution followed by Family.
subtype_id : int: A counter that increments when new subtypes are created. Used as an identifier for a given subtype.
subtypes : dict: A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.
all_subtypes : dict: A dictionary of all subtypes that have been created in the family.
random_state : np.random.RandomState: The PRNG associated with this family to be used for the sampling and creation of subtypes.

add_subtype(subtype_name=None, attributes=None)[source]¶: Create a copy of the distribution class that is identical and independent of the original.

classmethod load(distribution, root='.edocache')[source]¶: Load in any existing cached subtype dictionaries for distribution and restore the subtype along with the family’s random state.

make_instance(random_state)[source]¶: Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.

reset(root=None)[source]¶: Reset the family to have no subtypes and the default numpy PRNG. If root is passed then any cached information about the family is deleted.

save(root='.edocache')[source]¶: Save the current subtypes in the family and the family’s random state in the root directory.

edo.fitness module¶

Functions for calculating individual and population fitness.

edo.fitness.get_population_fitness(population, fitness, processes=None, **kwargs)[source]¶: Return the fitness of each individual in the population. This can be done in parallel by specifying a number of cores to use for independent processes.

edo.fitness.write_fitness(fitness, generation, root)[source]¶: Write the generation fitness to file in the root directory.

edo.individual module¶

A collection of objects to facilitate an individual representation.

class edo.individual.Individual(dataframe, metadata, random_state=None)[source]¶

Bases: object

A class to represent an individual in the EA.

Parameters:	dataframe : pd.DataFrame or dd.DataFrame The dataframe of the individual. metadata : list A list of distributions that are associated with the respective column of `dataframe`. random_state : np.random.RandomState, optional The PRNG for the individual. If not provided, the default PRNG is used.
Attributes:	fitness : float The fitness of the individual. Initialises as `None`.

classmethod from_file(path, distributions, family_root='.edocache', method='pandas')[source]¶: Create an instance of Individual from the files at path and family_root using either pandas or dask to read in individuals. Always fall back on pandas.

to_file(path, family_root='.edocache')[source]¶: Write self to file.

edo.individual.create_individual(row_limits, col_limits, families, weights, random_state)[source]¶

Create an individual within the limits provided.

Parameters:

row_limits : list: Lower and upper bounds on the number of rows a dataset can have.
col_limits : list: Lower and upper bounds on the number of columns a dataset can have. Tuples can be used to indicate limits on the number of columns needed from each family in families.
families : list: A list of edo.Family instances handling the column distributions that can be selected from.
weights : list: A sequence of relative weights with which to sample from families. If None, then sampling is uniform.
random_state : numpy.random.RandomState: The PRNG associated with the individual to use for its random sampling.

edo.optimiser module¶

The evolutionary dataset optimisation algorithm class.

class edo.optimiser.DataOptimiser(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]¶

Bases: object

The (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.

Parameters:

fitness : func

Any real-valued function that at least takes an instance of Individual as argument. Any further arguments should be passed in the kwargs parameter of the run method.

size : int

The size of the population to create.

row_limits : list

Lower and upper bounds on the number of rows a dataset can have.

col_limits : list

Lower and upper bounds on the number of columns a dataset can have.

Tuples can also be used to specify the min/maximum number of columns there can be of each element in families.

families : list

A list of edo.Family instances that handle the distribution classes used to populate the individuals in the EA.

weights : list

A set of relative weights on how to select elements from families. If None, they will be chosen uniformly.

max_iter : int

The maximum number of iterations to be carried out before terminating.

best_prop : float

The proportion of a population from which to select the “best” individuals to be parents.

lucky_prop : float

The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to 0.

crossover_prob : float

The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to 0.5.

mutation_prob : float

The probability of a particular characteristic of an individual being mutated. If using a dwindle method, this is an initial probability.

shrinkage : float

The relative size to shrink each parameter’s limits by for each distribution in families. Defaults to None but must be between 0 and 1 (exclusive).

maximise : bool

Determines whether fitness is a function to be maximised or not. Fitness scores are minimised by default.

dwindle(**kwargs)[source]¶: A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.

run(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]¶

Run the evolutionary algorithm under the given constraints.

Parameters:

root : str, optional: The directory in which to write all generations to file. If None, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containing dask objects.
random_state : int or np.ran.RandomState, optional: The random seed or state for a particular run of the algorithm. If None, the default PRNG is used.
processes : int, optional: The number of parallel processes to use when calculating the population fitness. If None then a single-thread scheduler is used.
fitness_kwargs : dict, optional: Any additional parameters for the fitness function should be placed here.
stop_kwargs : dict, optional: Any additional parameters for the stop method should be placed here.
dwindle_kwargs : dict, optional: Any additional parameters for the dwindle method should be placed here.

Returns:

pop_history : list: Every individual in each generation as a nested list of Individual instances.
fit_history : pd.DataFrame or dask.dataframe.DataFrame: Every individual’s fitness in each generation.

stop(**kwargs)[source]¶: A placeholder for a function which acts as a stopping condition on the EA.

edo.population module¶

Functions for the creation and updating of a population.

edo.population.create_initial_population(row_limits, col_limits, families, weights, random_states)[source]¶

Create an initial population for the genetic algorithm based on the given parameters.

Parameters:

size : int: The number of individuals in the population.
row_limits : list: Limits on the number of rows a dataset can have.
col_limits : list: Limits on the number of columns a dataset can have.
families : list: A list of edo.Family instances that handle the column distribution classes.
weights : list: Relative weights with which to sample from families. If None, sampling is done uniformly.
random_states : dict: A mapping of the index of the population to a numpy.random.RandomState instance that is to be assigned to the individual at that index in the population.

Returns:

population : list: A population of newly created individuals.

edo.population.create_new_population(parents, population, crossover_prob, mutation_prob, row_limits, col_limits, families, weights, random_states)[source]¶

Given a set of potential parents to be carried into the next generation, create offspring from pairs within that set until there are enough individuals.

Parameters:

parents : list: A list of edo.individual.Individual instances used to create new offspring.
population : list: The current population.
crossover_prob : float: The probability with which to sample dimensions from the first parent over the second during crossover.
mutation_prob : float: The probability with which to mutate a component of a newly created individual.
row_limits : list: Limits on the number of rows a dataset can have.
col_limits : list: Limits on the number of columns a dataset can have.
families : list: The edo.Family instances from which to draw distribution instances.
weights : list: Weights used to sample elements from families.
random_states : dict: The PRNGs assigned to each individual in the population.

edo.version module¶

The current version of the library.

Module contents¶

Top-level imports for the library.

class edo.DataOptimiser(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]¶

Bases: object

The (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.

Parameters:

fitness : func

Any real-valued function that at least takes an instance of Individual as argument. Any further arguments should be passed in the kwargs parameter of the run method.

size : int

The size of the population to create.

row_limits : list

Lower and upper bounds on the number of rows a dataset can have.

col_limits : list

Lower and upper bounds on the number of columns a dataset can have.

Tuples can also be used to specify the min/maximum number of columns there can be of each element in families.

families : list

A list of edo.Family instances that handle the distribution classes used to populate the individuals in the EA.

weights : list

A set of relative weights on how to select elements from families. If None, they will be chosen uniformly.

max_iter : int

The maximum number of iterations to be carried out before terminating.

best_prop : float

The proportion of a population from which to select the “best” individuals to be parents.

lucky_prop : float

The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to 0.

crossover_prob : float

The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to 0.5.

mutation_prob : float

The probability of a particular characteristic of an individual being mutated. If using a dwindle method, this is an initial probability.

shrinkage : float

The relative size to shrink each parameter’s limits by for each distribution in families. Defaults to None but must be between 0 and 1 (exclusive).

maximise : bool

Determines whether fitness is a function to be maximised or not. Fitness scores are minimised by default.

dwindle(**kwargs)[source]¶: A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.

run(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]¶

Run the evolutionary algorithm under the given constraints.

Parameters:

root : str, optional: The directory in which to write all generations to file. If None, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containing dask objects.
random_state : int or np.ran.RandomState, optional: The random seed or state for a particular run of the algorithm. If None, the default PRNG is used.
processes : int, optional: The number of parallel processes to use when calculating the population fitness. If None then a single-thread scheduler is used.
fitness_kwargs : dict, optional: Any additional parameters for the fitness function should be placed here.
stop_kwargs : dict, optional: Any additional parameters for the stop method should be placed here.
dwindle_kwargs : dict, optional: Any additional parameters for the dwindle method should be placed here.

Returns:

pop_history : list: Every individual in each generation as a nested list of Individual instances.
fit_history : pd.DataFrame or dask.dataframe.DataFrame: Every individual’s fitness in each generation.

stop(**kwargs)[source]¶: A placeholder for a function which acts as a stopping condition on the EA.

class edo.Family(distribution, max_subtypes=None)[source]¶

Bases: object

A class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.

Parameters:

distribution : edo.distributions.Distribution: The distribution class to keep track of. Must be of the same form as those in edo.distributions.
max_subtypes : int: The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.

Attributes:

name : str: The name of the family’s distribution followed by Family.
subtype_id : int: A counter that increments when new subtypes are created. Used as an identifier for a given subtype.
subtypes : dict: A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.
all_subtypes : dict: A dictionary of all subtypes that have been created in the family.
random_state : np.random.RandomState: The PRNG associated with this family to be used for the sampling and creation of subtypes.

add_subtype(subtype_name=None, attributes=None)[source]¶: Create a copy of the distribution class that is identical and independent of the original.

classmethod load(distribution, root='.edocache')[source]¶: Load in any existing cached subtype dictionaries for distribution and restore the subtype along with the family’s random state.

make_instance(random_state)[source]¶: Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.

reset(root=None)[source]¶: Reset the family to have no subtypes and the default numpy PRNG. If root is passed then any cached information about the family is deleted.

save(root='.edocache')[source]¶: Save the current subtypes in the family and the family’s random state in the root directory.