edo package¶
Subpackages¶
Submodules¶
edo.family module¶
The distribution subtype handler.
-
class
edo.family.Family(distribution, max_subtypes=None)[source]¶ Bases:
objectA class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.
- Parameters
- distributionedo.distributions.Distribution
The distribution class to keep track of. Must be of the same form as those in
edo.distributions.- max_subtypesint
The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.
- Attributes
- namestr
The name of the family’s distribution followed by
Family.- subtype_idint
A counter that increments when new subtypes are created. Used as an identifier for a given subtype.
- subtypesdict
A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.
- all_subtypesdict
A dictionary of all subtypes that have been created in the family.
- random_statenp.random.RandomState
The PRNG associated with this family to be used for the sampling and creation of subtypes.
-
add_subtype(subtype_name=None, attributes=None)[source]¶ Create a copy of the distribution class that is identical and independent of the original.
-
classmethod
load(distribution, root='.edocache')[source]¶ Load in any existing cached subtype dictionaries for
distributionand restore the subtype along with the family’s random state.
-
make_instance(random_state)[source]¶ Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.
edo.fitness module¶
Functions for calculating individual and population fitness.
edo.individual module¶
A collection of objects to facilitate an individual representation.
-
class
edo.individual.Individual(dataframe, metadata, random_state=None)[source]¶ Bases:
objectA class to represent an individual in the EA.
- Parameters
- dataframepd.DataFrame or dd.DataFrame
The dataframe of the individual.
- metadatalist
A list of distributions that are associated with the respective column of
dataframe.- random_statenp.random.RandomState, optional
The PRNG for the individual. If not provided, the default PRNG is used.
- Attributes
- fitnessfloat
The fitness of the individual. Initialises as
None.
-
edo.individual.create_individual(row_limits, col_limits, families, weights, random_state)[source]¶ Create an individual within the limits provided.
- Parameters
- row_limitslist
Lower and upper bounds on the number of rows a dataset can have.
- col_limitslist
Lower and upper bounds on the number of columns a dataset can have. Tuples can be used to indicate limits on the number of columns needed from each family in
families.- familieslist
A list of
edo.Familyinstances handling the column distributions that can be selected from.- weightslist
A sequence of relative weights with which to sample from
families. IfNone, then sampling is uniform.- random_statenumpy.random.RandomState
The PRNG associated with the individual to use for its random sampling.
edo.optimiser module¶
The evolutionary dataset optimisation algorithm class.
-
class
edo.optimiser.DataOptimiser(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]¶ Bases:
objectThe (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.
- Parameters
- fitnessfunc
Any real-valued function that at least takes an instance of
Individualas argument. Any further arguments should be passed in thekwargsparameter of therunmethod.- sizeint
The size of the population to create.
- row_limitslist
Lower and upper bounds on the number of rows a dataset can have.
- col_limitslist
Lower and upper bounds on the number of columns a dataset can have.
Tuples can also be used to specify the min/maximum number of columns there can be of each element in
families.- familieslist
A list of
edo.Familyinstances that handle the distribution classes used to populate the individuals in the EA.- weightslist
A set of relative weights on how to select elements from
families. IfNone, they will be chosen uniformly.- max_iterint
The maximum number of iterations to be carried out before terminating.
- best_propfloat
The proportion of a population from which to select the “best” individuals to be parents.
- lucky_propfloat
The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to
0.- crossover_probfloat
The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to
0.5.- mutation_probfloat
The probability of a particular characteristic of an individual being mutated. If using a
dwindlemethod, this is an initial probability.- shrinkagefloat
The relative size to shrink each parameter’s limits by for each distribution in
families. Defaults toNonebut must be between 0 and 1 (exclusive).- maximisebool
Determines whether
fitnessis a function to be maximised or not. Fitness scores are minimised by default.
-
dwindle(**kwargs)[source]¶ A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.
-
run(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]¶ Run the evolutionary algorithm under the given constraints.
- Parameters
- rootstr, optional
The directory in which to write all generations to file. If
None, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containingdaskobjects.- random_stateint or np.ran.RandomState, optional
The random seed or state for a particular run of the algorithm. If
None, the default PRNG is used.- processesint, optional
The number of parallel processes to use when calculating the population fitness. If
Nonethen a single-thread scheduler is used.- fitness_kwargsdict, optional
Any additional parameters for the fitness function should be placed here.
- stop_kwargsdict, optional
Any additional parameters for the
stopmethod should be placed here.- dwindle_kwargsdict, optional
Any additional parameters for the
dwindlemethod should be placed here.
- Returns
- pop_historylist
Every individual in each generation as a nested list of
Individualinstances.- fit_history
pd.DataFrameordask.dataframe.DataFrame Every individual’s fitness in each generation.
edo.population module¶
Functions for the creation and updating of a population.
-
edo.population.create_initial_population(row_limits, col_limits, families, weights, random_states)[source]¶ Create an initial population for the genetic algorithm based on the given parameters.
- Parameters
- sizeint
The number of individuals in the population.
- row_limitslist
Limits on the number of rows a dataset can have.
- col_limitslist
Limits on the number of columns a dataset can have.
- familieslist
A list of
edo.Familyinstances that handle the column distribution classes.- weightslist
Relative weights with which to sample from
families. IfNone, sampling is done uniformly.- random_statesdict
A mapping of the index of the population to a
numpy.random.RandomStateinstance that is to be assigned to the individual at that index in the population.
- Returns
- populationlist
A population of newly created individuals.
-
edo.population.create_new_population(parents, population, crossover_prob, mutation_prob, row_limits, col_limits, families, weights, random_states)[source]¶ Given a set of potential parents to be carried into the next generation, create offspring from pairs within that set until there are enough individuals.
- Parameters
- parentslist
A list of edo.individual.Individual instances used to create new offspring.
- populationlist
The current population.
- crossover_probfloat
The probability with which to sample dimensions from the first parent over the second during crossover.
- mutation_probfloat
The probability with which to mutate a component of a newly created individual.
- row_limitslist
Limits on the number of rows a dataset can have.
- col_limitslist
Limits on the number of columns a dataset can have.
- familieslist
The
edo.Familyinstances from which to draw distribution instances.- weightslist
Weights used to sample elements from
families.- random_statesdict
The PRNGs assigned to each individual in the population.
edo.version module¶
The current version of the library.
Module contents¶
Top-level imports for the library.
-
class
edo.DataOptimiser(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]¶ Bases:
objectThe (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.
- Parameters
- fitnessfunc
Any real-valued function that at least takes an instance of
Individualas argument. Any further arguments should be passed in thekwargsparameter of therunmethod.- sizeint
The size of the population to create.
- row_limitslist
Lower and upper bounds on the number of rows a dataset can have.
- col_limitslist
Lower and upper bounds on the number of columns a dataset can have.
Tuples can also be used to specify the min/maximum number of columns there can be of each element in
families.- familieslist
A list of
edo.Familyinstances that handle the distribution classes used to populate the individuals in the EA.- weightslist
A set of relative weights on how to select elements from
families. IfNone, they will be chosen uniformly.- max_iterint
The maximum number of iterations to be carried out before terminating.
- best_propfloat
The proportion of a population from which to select the “best” individuals to be parents.
- lucky_propfloat
The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to
0.- crossover_probfloat
The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to
0.5.- mutation_probfloat
The probability of a particular characteristic of an individual being mutated. If using a
dwindlemethod, this is an initial probability.- shrinkagefloat
The relative size to shrink each parameter’s limits by for each distribution in
families. Defaults toNonebut must be between 0 and 1 (exclusive).- maximisebool
Determines whether
fitnessis a function to be maximised or not. Fitness scores are minimised by default.
-
dwindle(**kwargs)[source]¶ A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.
-
run(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]¶ Run the evolutionary algorithm under the given constraints.
- Parameters
- rootstr, optional
The directory in which to write all generations to file. If
None, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containingdaskobjects.- random_stateint or np.ran.RandomState, optional
The random seed or state for a particular run of the algorithm. If
None, the default PRNG is used.- processesint, optional
The number of parallel processes to use when calculating the population fitness. If
Nonethen a single-thread scheduler is used.- fitness_kwargsdict, optional
Any additional parameters for the fitness function should be placed here.
- stop_kwargsdict, optional
Any additional parameters for the
stopmethod should be placed here.- dwindle_kwargsdict, optional
Any additional parameters for the
dwindlemethod should be placed here.
- Returns
- pop_historylist
Every individual in each generation as a nested list of
Individualinstances.- fit_history
pd.DataFrameordask.dataframe.DataFrame Every individual’s fitness in each generation.
-
class
edo.Family(distribution, max_subtypes=None)[source]¶ Bases:
objectA class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.
- Parameters
- distributionedo.distributions.Distribution
The distribution class to keep track of. Must be of the same form as those in
edo.distributions.- max_subtypesint
The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.
- Attributes
- namestr
The name of the family’s distribution followed by
Family.- subtype_idint
A counter that increments when new subtypes are created. Used as an identifier for a given subtype.
- subtypesdict
A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.
- all_subtypesdict
A dictionary of all subtypes that have been created in the family.
- random_statenp.random.RandomState
The PRNG associated with this family to be used for the sampling and creation of subtypes.
-
add_subtype(subtype_name=None, attributes=None)[source]¶ Create a copy of the distribution class that is identical and independent of the original.
-
classmethod
load(distribution, root='.edocache')[source]¶ Load in any existing cached subtype dictionaries for
distributionand restore the subtype along with the family’s random state.
-
make_instance(random_state)[source]¶ Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.