Today, in the world of data mining, business intelligence and analytics, techniques which can learn and provide decision support is gradually gaining in importance in leaps and bound. The three major techniques or groups of algorithms which have gained a lot of visibility in recent times are fuzzy logic, neural networks and genetic algorithms.
A genetic algorithm (GA) is a search based, self learning algorithm (technique) that imitates the theory of natural evolution based on selective screening of results based on fitness of purpose. This self learning algorithm is routinely used to generate solutions to multi-criteria decision making problems, optimization problems and search problems. Genetic algorithms have evolved from the more popular class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques mimicked from that of natural evolution like those of inheritance, selection, mutation and crossover.
The structure of a generic technique developed based on the genetic algorithm would follow the mention sequence given in the diagram below:
First while coding using a genetic algorithm, the first activity is to code the data-sets in such a way so as to be equatable for the fitness of purpose of the problem domain. Then typically, a random sample is chosen from the data set and the fitness of purpose of these data points are evaluated. Now for those points which have a higher fitness score, they are selected for further application of genetic algorithm optimization techniques like those of inheritance, selection, mutation and crossover. The other data points are rejected and the algorithm continues to recreate data-points from this pool of selected data points using the techniques of inheritance, selection, mutation and crossover.
Selection: During each successive iteration, a part of the existing population is selected to form a new generation sample pool. These data points are selected through a fitness-based purpose of usage in the problem domain, where fitter solutions (as measured by a fitness function) are selected preferentially. Some selection algorithms rate the fitness of purpose of each solution and preferentially select the best solutions. Other methods often rate only a random sample of the population, as this process may be very time-consuming.
Reproduction using Cross-over and Mutation
The next step is to generate the next generation population of data points from those selected earlier (using selection), through the operator algorithms called crossover (also called recombination), and/or mutation.
Cross-over: During cross-over, a pair (or triplet) of data points are chosen from the refined selection and they are combined and data-exchange takes place based on the individual coding. After the data exchange occurs multiple times, the new data points created from this process is returned to the selection pool to be operated by the selection operator.
Mutation: The coded data point obtained after the operation of selection on the sample pool of data points, is modified by tweaking one parameter to test its fitness factor and this continues for the entirely set of selected data points. The newly created generation of data-points are then tested using the selection operator again and the entirely process is restarted.
Although Crossover and Mutation are known as the main operators of genetic algorithm, there are also other operators such as regrouping, colonization-extinction, or migration in genetic algorithms.
The application of these operators and re-selection of these new generation data-points continue till the optimal fitness of purpose is attained. Common terminating conditions are often due to a solution satisfying minimum criteria, fixed number of generations being reached, allocated resources (complexity of computation, computation time, money) reached, the highest ranking solution’s fitness level is reaching or a fitness plateau has been reached such that successive iterations no longer produce better results, manual inspection produces better results or even combinations of these.