edo package¶
Subpackages¶
Submodules¶
edo.family module¶
The distribution subtype handler.
-
class
edo.family.
Family
(distribution, max_subtypes=None)[source]¶ Bases:
object
A class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.
- Parameters
- distributionedo.distributions.Distribution
The distribution class to keep track of. Must be of the same form as those in
edo.distributions
.- max_subtypesint
The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.
- Attributes
- namestr
The name of the family’s distribution followed by
Family
.- subtype_idint
A counter that increments when new subtypes are created. Used as an identifier for a given subtype.
- subtypesdict
A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.
- all_subtypesdict
A dictionary of all subtypes that have been created in the family.
- random_statenp.random.RandomState
The PRNG associated with this family to be used for the sampling and creation of subtypes.
-
add_subtype
(subtype_name=None, attributes=None)[source]¶ Create a copy of the distribution class that is identical and independent of the original.
-
classmethod
load
(distribution, root='.edocache')[source]¶ Load in any existing cached subtype dictionaries for
distribution
and restore the subtype along with the family’s random state.
-
make_instance
(random_state)[source]¶ Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.
edo.fitness module¶
Functions for calculating individual and population fitness.
edo.individual module¶
A collection of objects to facilitate an individual representation.
-
class
edo.individual.
Individual
(dataframe, metadata, random_state=None)[source]¶ Bases:
object
A class to represent an individual in the EA.
- Parameters
- dataframepd.DataFrame or dd.DataFrame
The dataframe of the individual.
- metadatalist
A list of distributions that are associated with the respective column of
dataframe
.- random_statenp.random.RandomState, optional
The PRNG for the individual. If not provided, the default PRNG is used.
- Attributes
- fitnessfloat
The fitness of the individual. Initialises as
None
.
-
edo.individual.
create_individual
(row_limits, col_limits, families, weights, random_state)[source]¶ Create an individual within the limits provided.
- Parameters
- row_limitslist
Lower and upper bounds on the number of rows a dataset can have.
- col_limitslist
Lower and upper bounds on the number of columns a dataset can have. Tuples can be used to indicate limits on the number of columns needed from each family in
families
.- familieslist
A list of
edo.Family
instances handling the column distributions that can be selected from.- weightslist
A sequence of relative weights with which to sample from
families
. IfNone
, then sampling is uniform.- random_statenumpy.random.RandomState
The PRNG associated with the individual to use for its random sampling.
edo.optimiser module¶
The evolutionary dataset optimisation algorithm class.
-
class
edo.optimiser.
DataOptimiser
(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]¶ Bases:
object
The (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.
- Parameters
- fitnessfunc
Any real-valued function that at least takes an instance of
Individual
as argument. Any further arguments should be passed in thekwargs
parameter of therun
method.- sizeint
The size of the population to create.
- row_limitslist
Lower and upper bounds on the number of rows a dataset can have.
- col_limitslist
Lower and upper bounds on the number of columns a dataset can have.
Tuples can also be used to specify the min/maximum number of columns there can be of each element in
families
.- familieslist
A list of
edo.Family
instances that handle the distribution classes used to populate the individuals in the EA.- weightslist
A set of relative weights on how to select elements from
families
. IfNone
, they will be chosen uniformly.- max_iterint
The maximum number of iterations to be carried out before terminating.
- best_propfloat
The proportion of a population from which to select the “best” individuals to be parents.
- lucky_propfloat
The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to
0
.- crossover_probfloat
The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to
0.5
.- mutation_probfloat
The probability of a particular characteristic of an individual being mutated. If using a
dwindle
method, this is an initial probability.- shrinkagefloat
The relative size to shrink each parameter’s limits by for each distribution in
families
. Defaults toNone
but must be between 0 and 1 (exclusive).- maximisebool
Determines whether
fitness
is a function to be maximised or not. Fitness scores are minimised by default.
-
dwindle
(**kwargs)[source]¶ A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.
-
run
(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]¶ Run the evolutionary algorithm under the given constraints.
- Parameters
- rootstr, optional
The directory in which to write all generations to file. If
None
, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containingdask
objects.- random_stateint or np.ran.RandomState, optional
The random seed or state for a particular run of the algorithm. If
None
, the default PRNG is used.- processesint, optional
The number of parallel processes to use when calculating the population fitness. If
None
then a single-thread scheduler is used.- fitness_kwargsdict, optional
Any additional parameters for the fitness function should be placed here.
- stop_kwargsdict, optional
Any additional parameters for the
stop
method should be placed here.- dwindle_kwargsdict, optional
Any additional parameters for the
dwindle
method should be placed here.
- Returns
- pop_historylist
Every individual in each generation as a nested list of
Individual
instances.- fit_history
pd.DataFrame
ordask.dataframe.DataFrame
Every individual’s fitness in each generation.
edo.population module¶
Functions for the creation and updating of a population.
-
edo.population.
create_initial_population
(row_limits, col_limits, families, weights, random_states)[source]¶ Create an initial population for the genetic algorithm based on the given parameters.
- Parameters
- sizeint
The number of individuals in the population.
- row_limitslist
Limits on the number of rows a dataset can have.
- col_limitslist
Limits on the number of columns a dataset can have.
- familieslist
A list of
edo.Family
instances that handle the column distribution classes.- weightslist
Relative weights with which to sample from
families
. IfNone
, sampling is done uniformly.- random_statesdict
A mapping of the index of the population to a
numpy.random.RandomState
instance that is to be assigned to the individual at that index in the population.
- Returns
- populationlist
A population of newly created individuals.
-
edo.population.
create_new_population
(parents, population, crossover_prob, mutation_prob, row_limits, col_limits, families, weights, random_states)[source]¶ Given a set of potential parents to be carried into the next generation, create offspring from pairs within that set until there are enough individuals.
- Parameters
- parentslist
A list of edo.individual.Individual instances used to create new offspring.
- populationlist
The current population.
- crossover_probfloat
The probability with which to sample dimensions from the first parent over the second during crossover.
- mutation_probfloat
The probability with which to mutate a component of a newly created individual.
- row_limitslist
Limits on the number of rows a dataset can have.
- col_limitslist
Limits on the number of columns a dataset can have.
- familieslist
The
edo.Family
instances from which to draw distribution instances.- weightslist
Weights used to sample elements from
families
.- random_statesdict
The PRNGs assigned to each individual in the population.
edo.version module¶
The current version of the library.
Module contents¶
Top-level imports for the library.
-
class
edo.
DataOptimiser
(fitness, size, row_limits, col_limits, families, weights=None, max_iter=100, best_prop=0.25, lucky_prop=0, crossover_prob=0.5, mutation_prob=0.01, shrinkage=None, maximise=False)[source]¶ Bases:
object
The (evolutionary) dataset optimiser. A class that generates data for a given fitness function and evolutionary parameters.
- Parameters
- fitnessfunc
Any real-valued function that at least takes an instance of
Individual
as argument. Any further arguments should be passed in thekwargs
parameter of therun
method.- sizeint
The size of the population to create.
- row_limitslist
Lower and upper bounds on the number of rows a dataset can have.
- col_limitslist
Lower and upper bounds on the number of columns a dataset can have.
Tuples can also be used to specify the min/maximum number of columns there can be of each element in
families
.- familieslist
A list of
edo.Family
instances that handle the distribution classes used to populate the individuals in the EA.- weightslist
A set of relative weights on how to select elements from
families
. IfNone
, they will be chosen uniformly.- max_iterint
The maximum number of iterations to be carried out before terminating.
- best_propfloat
The proportion of a population from which to select the “best” individuals to be parents.
- lucky_propfloat
The proportion of a population from which to sample some “lucky” individuals to be parents. Defaults to
0
.- crossover_probfloat
The probability with which to sample dimensions from the first parent over the second in a crossover operation. Defaults to
0.5
.- mutation_probfloat
The probability of a particular characteristic of an individual being mutated. If using a
dwindle
method, this is an initial probability.- shrinkagefloat
The relative size to shrink each parameter’s limits by for each distribution in
families
. Defaults toNone
but must be between 0 and 1 (exclusive).- maximisebool
Determines whether
fitness
is a function to be maximised or not. Fitness scores are minimised by default.
-
dwindle
(**kwargs)[source]¶ A placeholder for a function which can adjust (typically, reduce) the mutation probability over the run of the EA.
-
run
(root=None, random_state=None, processes=None, fitness_kwargs=None, stop_kwargs=None, dwindle_kwargs=None)[source]¶ Run the evolutionary algorithm under the given constraints.
- Parameters
- rootstr, optional
The directory in which to write all generations to file. If
None
, nothing is written to file. Instead, every generation is kept in memory and is returned at the end. If writing to file, one generation is held in memory at a time and everything is returned upon termination as a tuple containingdask
objects.- random_stateint or np.ran.RandomState, optional
The random seed or state for a particular run of the algorithm. If
None
, the default PRNG is used.- processesint, optional
The number of parallel processes to use when calculating the population fitness. If
None
then a single-thread scheduler is used.- fitness_kwargsdict, optional
Any additional parameters for the fitness function should be placed here.
- stop_kwargsdict, optional
Any additional parameters for the
stop
method should be placed here.- dwindle_kwargsdict, optional
Any additional parameters for the
dwindle
method should be placed here.
- Returns
- pop_historylist
Every individual in each generation as a nested list of
Individual
instances.- fit_history
pd.DataFrame
ordask.dataframe.DataFrame
Every individual’s fitness in each generation.
-
class
edo.
Family
(distribution, max_subtypes=None)[source]¶ Bases:
object
A class for handling all concurrent subtypes of a distribution class. A subtype is an independent copy of the distribution class allowing more of the search space to be explored.
- Parameters
- distributionedo.distributions.Distribution
The distribution class to keep track of. Must be of the same form as those in
edo.distributions
.- max_subtypesint
The maximum number of subtypes in the family that are currently being used in a run of the EA. There is no limit by default.
- Attributes
- namestr
The name of the family’s distribution followed by
Family
.- subtype_idint
A counter that increments when new subtypes are created. Used as an identifier for a given subtype.
- subtypesdict
A dictionary that maps subtype identifiers to their corresponding subtype. This gets updated during a run to those that are currently being used in the population.
- all_subtypesdict
A dictionary of all subtypes that have been created in the family.
- random_statenp.random.RandomState
The PRNG associated with this family to be used for the sampling and creation of subtypes.
-
add_subtype
(subtype_name=None, attributes=None)[source]¶ Create a copy of the distribution class that is identical and independent of the original.
-
classmethod
load
(distribution, root='.edocache')[source]¶ Load in any existing cached subtype dictionaries for
distribution
and restore the subtype along with the family’s random state.
-
make_instance
(random_state)[source]¶ Select an existing subtype at random – or create a new one if there is space available – and return an instance of that subtype.