edo package

Submodules

edo.family module

The distribution subtype handler.

class edo.family.Family(distribution, max_subtypes=None)[source]

Bases: object

A class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.

Parameters
distributionedo.distributions.Distribution

The distribution class to keep track of. Must be of the same form as those in edo.distributions.

max_subtypesint

The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.

Attributes
namestr

The name of the family’s distribution followed by Family.

subtype_idint

A counter that increments when new subtypes are created. Used as an identifier for a given subtype.

subtypesdict

A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.

all_subtypesdict

A dictionary of all subtypes that have been created in the family.

random_statenp.random.RandomState

The PRNG associated with this family to be used for the sampling and creation of subtypes.

add_subtype(subtype_name=None, attributes=None)[source]

Create a copy of the distribution class that is identical and independent of the original.

classmethod load(distribution, root='.edocache')[source]

Load in any existing cached subtype dictionaries for distribution and restore the subtype along with the family’s random state.

make_instance(random_state)[source]

Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.

reset(root=None)[source]

Reset the family to have no subtypes and the default numpy PRNG. If root is passed then any cached information about the family is deleted.

save(root='.edocache')[source]

Save the current subtypes in the family and the family’s random state in the root directory.

edo.fitness module

Functions for calculating individual and population fitness.

edo.fitness.get_population_fitness(population, fitness, processes=None, **kwargs)[source]

Return the fitness of each individual in the population. This can be done in parallel by specifying a number of cores to use for independent processes.

edo.fitness.write_fitness(fitness, generation, root)[source]

Write the generation fitness to file in the root directory.

edo.individual module

A collection of objects to facilitate an individual representation.

class edo.individual.Individual(dataframe, metadata, random_state=None)[source]

Bases: object

A class to represent an individual in the EA.

Parameters
dataframepd.DataFrame or dd.DataFrame

The dataframe of the individual.

metadatalist

A list of distributions that are associated with the respective column of dataframe.

random_statenp.random.RandomState, optional

The PRNG for the individual. If not provided, the default PRNG is used.

Attributes
fitnessfloat

The fitness of the individual. Initialises as None.

classmethod from_file(path, distributions, family_root='.edocache', method='pandas')[source]

Create an instance of Individual from the files at path and family_root using either pandas or dask to read in individuals. Always fall back on pandas.

to_file(path, family_root='.edocache')[source]

Write self to file.

edo.individual.create_individual(row_limits, col_limits, families, weights, random_state)[source]

Create an individual within the limits provided.

Parameters
row_limitslist

Lower and upper bounds on the number of rows a dataset can have.

col_limitslist

Lower and upper bounds on the number of columns a dataset can have. Tuples can be used to indicate limits on the number of columns needed from each family in families.

familieslist

A list of edo.Family instances handling the column distributions that can be selected from.

weightslist

A sequence of relative weights with which to sample from families. If None, then sampling is uniform.

random_statenumpy.random.RandomState

The PRNG associated with the individual to use for its random sampling.

edo.optimiser module

The evolutionary dataset optimisation algorithm class.

class edo.optimiser.DataOptimiser(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]

Bases: object

The (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.

Parameters
fitnessfunc

Any real-valued function that at least takes an instance of Individual as argument. Any further arguments should be passed in the kwargs parameter of the run method.

sizeint

The size of the population to create.

row_limitslist

Lower and upper bounds on the number of rows a dataset can have.

col_limitslist

Lower and upper bounds on the number of columns a dataset can have.

Tuples can also be used to specify the min/maximum number of columns there can be of each element in families.

familieslist

A list of edo.Family instances that handle the distribution classes used to populate the individuals in the EA.

weightslist

A set of relative weights on how to select elements from families. If None, they will be chosen uniformly.

max_iterint

The maximum number of iterations to be carried out before terminating.

best_propfloat

The proportion of a population from which to select the “best” individuals to be parents.

lucky_propfloat

The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to 0.

crossover_probfloat

The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to 0.5.

mutation_probfloat

The probability of a particular characteristic of an individual being mutated. If using a dwindle method, this is an initial probability.

shrinkagefloat

The relative size to shrink each parameter’s limits by for each distribution in families. Defaults to None but must be between 0 and 1 (exclusive).

maximisebool

Determines whether fitness is a function to be maximised or not. Fitness scores are minimised by default.

dwindle(**kwargs)[source]

A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.

run(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]

Run the evolutionary algorithm under the given constraints.

Parameters
rootstr, optional

The directory in which to write all generations to file. If None, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containing dask objects.

random_stateint or np.ran.RandomState, optional

The random seed or state for a particular run of the algorithm. If None, the default PRNG is used.

processesint, optional

The number of parallel processes to use when calculating the population fitness. If None then a single-thread scheduler is used.

fitness_kwargsdict, optional

Any additional parameters for the fitness function should be placed here.

stop_kwargsdict, optional

Any additional parameters for the stop method should be placed here.

dwindle_kwargsdict, optional

Any additional parameters for the dwindle method should be placed here.

Returns
pop_historylist

Every individual in each generation as a nested list of Individual instances.

fit_historypd.DataFrame or dask.dataframe.DataFrame

Every individual’s fitness in each generation.

stop(**kwargs)[source]

A placeholder for a function which acts as a stopping condition on the EA.

edo.population module

Functions for the creation and updating of a population.

edo.population.create_initial_population(row_limits, col_limits, families, weights, random_states)[source]

Create an initial population for the genetic algorithm based on the given parameters.

Parameters
sizeint

The number of individuals in the population.

row_limitslist

Limits on the number of rows a dataset can have.

col_limitslist

Limits on the number of columns a dataset can have.

familieslist

A list of edo.Family instances that handle the column distribution classes.

weightslist

Relative weights with which to sample from families. If None, sampling is done uniformly.

random_statesdict

A mapping of the index of the population to a numpy.random.RandomState instance that is to be assigned to the individual at that index in the population.

Returns
populationlist

A population of newly created individuals.

edo.population.create_new_population(parents, population, crossover_prob, mutation_prob, row_limits, col_limits, families, weights, random_states)[source]

Given a set of potential parents to be carried into the next generation, create offspring from pairs within that set until there are enough individuals.

Parameters
parentslist

A list of edo.individual.Individual instances used to create new offspring.

populationlist

The current population.

crossover_probfloat

The probability with which to sample dimensions from the first parent over the second during crossover.

mutation_probfloat

The probability with which to mutate a component of a newly created individual.

row_limitslist

Limits on the number of rows a dataset can have.

col_limitslist

Limits on the number of columns a dataset can have.

familieslist

The edo.Family instances from which to draw distribution instances.

weightslist

Weights used to sample elements from families.

random_statesdict

The PRNGs assigned to each individual in the population.

edo.version module

The current version of the library.

Module contents

Top-level imports for the library.

class edo.DataOptimiser(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]

Bases: object

The (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.

Parameters
fitnessfunc

Any real-valued function that at least takes an instance of Individual as argument. Any further arguments should be passed in the kwargs parameter of the run method.

sizeint

The size of the population to create.

row_limitslist

Lower and upper bounds on the number of rows a dataset can have.

col_limitslist

Lower and upper bounds on the number of columns a dataset can have.

Tuples can also be used to specify the min/maximum number of columns there can be of each element in families.

familieslist

A list of edo.Family instances that handle the distribution classes used to populate the individuals in the EA.

weightslist

A set of relative weights on how to select elements from families. If None, they will be chosen uniformly.

max_iterint

The maximum number of iterations to be carried out before terminating.

best_propfloat

The proportion of a population from which to select the “best” individuals to be parents.

lucky_propfloat

The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to 0.

crossover_probfloat

The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to 0.5.

mutation_probfloat

The probability of a particular characteristic of an individual being mutated. If using a dwindle method, this is an initial probability.

shrinkagefloat

The relative size to shrink each parameter’s limits by for each distribution in families. Defaults to None but must be between 0 and 1 (exclusive).

maximisebool

Determines whether fitness is a function to be maximised or not. Fitness scores are minimised by default.

dwindle(**kwargs)[source]

A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.

run(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]

Run the evolutionary algorithm under the given constraints.

Parameters
rootstr, optional

The directory in which to write all generations to file. If None, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containing dask objects.

random_stateint or np.ran.RandomState, optional

The random seed or state for a particular run of the algorithm. If None, the default PRNG is used.

processesint, optional

The number of parallel processes to use when calculating the population fitness. If None then a single-thread scheduler is used.

fitness_kwargsdict, optional

Any additional parameters for the fitness function should be placed here.

stop_kwargsdict, optional

Any additional parameters for the stop method should be placed here.

dwindle_kwargsdict, optional

Any additional parameters for the dwindle method should be placed here.

Returns
pop_historylist

Every individual in each generation as a nested list of Individual instances.

fit_historypd.DataFrame or dask.dataframe.DataFrame

Every individual’s fitness in each generation.

stop(**kwargs)[source]

A placeholder for a function which acts as a stopping condition on the EA.

class edo.Family(distribution, max_subtypes=None)[source]

Bases: object

A class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.

Parameters
distributionedo.distributions.Distribution

The distribution class to keep track of. Must be of the same form as those in edo.distributions.

max_subtypesint

The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.

Attributes
namestr

The name of the family’s distribution followed by Family.

subtype_idint

A counter that increments when new subtypes are created. Used as an identifier for a given subtype.

subtypesdict

A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.

all_subtypesdict

A dictionary of all subtypes that have been created in the family.

random_statenp.random.RandomState

The PRNG associated with this family to be used for the sampling and creation of subtypes.

add_subtype(subtype_name=None, attributes=None)[source]

Create a copy of the distribution class that is identical and independent of the original.

classmethod load(distribution, root='.edocache')[source]

Load in any existing cached subtype dictionaries for distribution and restore the subtype along with the family’s random state.

make_instance(random_state)[source]

Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.

reset(root=None)[source]

Reset the family to have no subtypes and the default numpy PRNG. If root is passed then any cached information about the family is deleted.

save(root='.edocache')[source]

Save the current subtypes in the family and the family’s random state in the root directory.