edo package¶
Subpackages¶
Submodules¶
edo.family module¶
The distribution subtype handler.

class
edo.family.
Family
(distribution, max_subtypes=None)[source]¶ Bases:
object
A class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.
Parameters:  distribution : edo.distributions.Distribution
The distribution class to keep track of. Must be of the same form as those in
edo.distributions
. max_subtypes : int
The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.
Attributes:  name : str
The name of the family’s distribution followed by
Family
. subtype_id : int
A counter that increments when new subtypes are created. Used as an identifier for a given subtype.
 subtypes : dict
A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.
 all_subtypes : dict
A dictionary of all subtypes that have been created in the family.
 random_state : np.random.RandomState
The PRNG associated with this family to be used for the sampling and creation of subtypes.

add_subtype
(subtype_name=None, attributes=None)[source]¶ Create a copy of the distribution class that is identical and independent of the original.

classmethod
load
(distribution, root='.edocache')[source]¶ Load in any existing cached subtype dictionaries for
distribution
and restore the subtype along with the family’s random state.

make_instance
(random_state)[source]¶ Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.
edo.fitness module¶
Functions for calculating individual and population fitness.
edo.individual module¶
A collection of objects to facilitate an individual representation.

class
edo.individual.
Individual
(dataframe, metadata, random_state=None)[source]¶ Bases:
object
A class to represent an individual in the EA.
Parameters:  dataframe : pd.DataFrame or dd.DataFrame
The dataframe of the individual.
 metadata : list
A list of distributions that are associated with the respective column of
dataframe
. random_state : np.random.RandomState, optional
The PRNG for the individual. If not provided, the default PRNG is used.
Attributes:  fitness : float
The fitness of the individual. Initialises as
None
.

edo.individual.
create_individual
(row_limits, col_limits, families, weights, random_state)[source]¶ Create an individual within the limits provided.
Parameters:  row_limits : list
Lower and upper bounds on the number of rows a dataset can have.
 col_limits : list
Lower and upper bounds on the number of columns a dataset can have. Tuples can be used to indicate limits on the number of columns needed from each family in
families
. families : list
A list of
edo.Family
instances handling the column distributions that can be selected from. weights : list
A sequence of relative weights with which to sample from
families
. IfNone
, then sampling is uniform. random_state : numpy.random.RandomState
The PRNG associated with the individual to use for its random sampling.
edo.optimiser module¶
The evolutionary dataset optimisation algorithm class.

class
edo.optimiser.
DataOptimiser
(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]¶ Bases:
object
The (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.
Parameters:  fitness : func
Any realvalued function that at least takes an instance of
Individual
as argument. Any further arguments should be passed in thekwargs
parameter of therun
method. size : int
The size of the population to create.
 row_limits : list
Lower and upper bounds on the number of rows a dataset can have.
 col_limits : list
Lower and upper bounds on the number of columns a dataset can have.
Tuples can also be used to specify the min/maximum number of columns there can be of each element in
families
. families : list
A list of
edo.Family
instances that handle the distribution classes used to populate the individuals in the EA. weights : list
A set of relative weights on how to select elements from
families
. IfNone
, they will be chosen uniformly. max_iter : int
The maximum number of iterations to be carried out before terminating.
 best_prop : float
The proportion of a population from which to select the “best” individuals to be parents.
 lucky_prop : float
The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to
0
. crossover_prob : float
The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to
0.5
. mutation_prob : float
The probability of a particular characteristic of an individual being mutated. If using a
dwindle
method, this is an initial probability. shrinkage : float
The relative size to shrink each parameter’s limits by for each distribution in
families
. Defaults toNone
but must be between 0 and 1 (exclusive). maximise : bool
Determines whether
fitness
is a function to be maximised or not. Fitness scores are minimised by default.

dwindle
(**kwargs)[source]¶ A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.

run
(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]¶ Run the evolutionary algorithm under the given constraints.
Parameters:  root : str, optional
The directory in which to write all generations to file. If
None
, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containingdask
objects. random_state : int or np.ran.RandomState, optional
The random seed or state for a particular run of the algorithm. If
None
, the default PRNG is used. processes : int, optional
The number of parallel processes to use when calculating the population fitness. If
None
then a singlethread scheduler is used. fitness_kwargs : dict, optional
Any additional parameters for the fitness function should be placed here.
 stop_kwargs : dict, optional
Any additional parameters for the
stop
method should be placed here. dwindle_kwargs : dict, optional
Any additional parameters for the
dwindle
method should be placed here.
Returns:  pop_history : list
Every individual in each generation as a nested list of
Individual
instances. fit_history :
pd.DataFrame
ordask.dataframe.DataFrame
Every individual’s fitness in each generation.
edo.population module¶
Functions for the creation and updating of a population.

edo.population.
create_initial_population
(row_limits, col_limits, families, weights, random_states)[source]¶ Create an initial population for the genetic algorithm based on the given parameters.
Parameters:  size : int
The number of individuals in the population.
 row_limits : list
Limits on the number of rows a dataset can have.
 col_limits : list
Limits on the number of columns a dataset can have.
 families : list
A list of
edo.Family
instances that handle the column distribution classes. weights : list
Relative weights with which to sample from
families
. IfNone
, sampling is done uniformly. random_states : dict
A mapping of the index of the population to a
numpy.random.RandomState
instance that is to be assigned to the individual at that index in the population.
Returns:  population : list
A population of newly created individuals.

edo.population.
create_new_population
(parents, population, crossover_prob, mutation_prob, row_limits, col_limits, families, weights, random_states)[source]¶ Given a set of potential parents to be carried into the next generation, create offspring from pairs within that set until there are enough individuals.
Parameters:  parents : list
A list of edo.individual.Individual instances used to create new offspring.
 population : list
The current population.
 crossover_prob : float
The probability with which to sample dimensions from the first parent over the second during crossover.
 mutation_prob : float
The probability with which to mutate a component of a newly created individual.
 row_limits : list
Limits on the number of rows a dataset can have.
 col_limits : list
Limits on the number of columns a dataset can have.
 families : list
The
edo.Family
instances from which to draw distribution instances. weights : list
Weights used to sample elements from
families
. random_states : dict
The PRNGs assigned to each individual in the population.
edo.version module¶
The current version of the library.
Module contents¶
Toplevel imports for the library.

class
edo.
DataOptimiser
(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]¶ Bases:
object
The (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.
Parameters:  fitness : func
Any realvalued function that at least takes an instance of
Individual
as argument. Any further arguments should be passed in thekwargs
parameter of therun
method. size : int
The size of the population to create.
 row_limits : list
Lower and upper bounds on the number of rows a dataset can have.
 col_limits : list
Lower and upper bounds on the number of columns a dataset can have.
Tuples can also be used to specify the min/maximum number of columns there can be of each element in
families
. families : list
A list of
edo.Family
instances that handle the distribution classes used to populate the individuals in the EA. weights : list
A set of relative weights on how to select elements from
families
. IfNone
, they will be chosen uniformly. max_iter : int
The maximum number of iterations to be carried out before terminating.
 best_prop : float
The proportion of a population from which to select the “best” individuals to be parents.
 lucky_prop : float
The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to
0
. crossover_prob : float
The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to
0.5
. mutation_prob : float
The probability of a particular characteristic of an individual being mutated. If using a
dwindle
method, this is an initial probability. shrinkage : float
The relative size to shrink each parameter’s limits by for each distribution in
families
. Defaults toNone
but must be between 0 and 1 (exclusive). maximise : bool
Determines whether
fitness
is a function to be maximised or not. Fitness scores are minimised by default.

dwindle
(**kwargs)[source]¶ A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.

run
(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]¶ Run the evolutionary algorithm under the given constraints.
Parameters:  root : str, optional
The directory in which to write all generations to file. If
None
, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containingdask
objects. random_state : int or np.ran.RandomState, optional
The random seed or state for a particular run of the algorithm. If
None
, the default PRNG is used. processes : int, optional
The number of parallel processes to use when calculating the population fitness. If
None
then a singlethread scheduler is used. fitness_kwargs : dict, optional
Any additional parameters for the fitness function should be placed here.
 stop_kwargs : dict, optional
Any additional parameters for the
stop
method should be placed here. dwindle_kwargs : dict, optional
Any additional parameters for the
dwindle
method should be placed here.
Returns:  pop_history : list
Every individual in each generation as a nested list of
Individual
instances. fit_history :
pd.DataFrame
ordask.dataframe.DataFrame
Every individual’s fitness in each generation.

class
edo.
Family
(distribution, max_subtypes=None)[source]¶ Bases:
object
A class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.
Parameters:  distribution : edo.distributions.Distribution
The distribution class to keep track of. Must be of the same form as those in
edo.distributions
. max_subtypes : int
The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.
Attributes:  name : str
The name of the family’s distribution followed by
Family
. subtype_id : int
A counter that increments when new subtypes are created. Used as an identifier for a given subtype.
 subtypes : dict
A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.
 all_subtypes : dict
A dictionary of all subtypes that have been created in the family.
 random_state : np.random.RandomState
The PRNG associated with this family to be used for the sampling and creation of subtypes.

add_subtype
(subtype_name=None, attributes=None)[source]¶ Create a copy of the distribution class that is identical and independent of the original.

classmethod
load
(distribution, root='.edocache')[source]¶ Load in any existing cached subtype dictionaries for
distribution
and restore the subtype along with the family’s random state.

make_instance
(random_state)[source]¶ Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.