Evolution Parsing Algorithm
An evolutionary parsing algorithm has been developed that produces parses with better performance than classic chart parsers based on exhaustive search. It achieves this by using a population that ranks incomplete constituents along with those with complete categories and employing elitism to accelerate the process.
It also uses a conservative crossover operator instead of a speculative one, which significantly reduces the search space. 에볼루션파워볼 커뮤니티
The Initial Population
The system starts with a random initial population of individuals. Each individual is a parse of a sentence segment, namely a tree obtained by applying the CFG to this sentence. Each entry of this parse tree registers the syntactic category associated with the left-hand side of the top-level grammar rule that was applied.
The algorithm evaluates each member of this population by computing its raw fitness values and scales them to a range that is easier to work with, called expectation values. It selects some members, called parents, based on their expectation value and produces children from them.
Some of the children produce incomplete individuals whose subtrees cannot be combined with other members to complete a valid parse. To deal with this issue, the algorithm introduces an aging mechanism, as shown in Figure. As soon as an incomplete individual reaches a certain age, it dies and is discarded from the population. This reduces the size of the search space significantly. 에볼루션 파싱알
The Genetic Operators
As the name indicates, this algorithm uses a genetic approach to parsing. This means that the individual population members (genes) represent parsed linguistic elements. The best genes, that is, the ones that produce complete constituents with minimal time and resources, will be selected to form the next generation.
Genetic algorithms are based on the Darwinian principle of survival of the fittest: those genes that perform better in their environment will be passed on to the next generation. This selection process is implemented by applying “genetic operators”. These are typically two: crossover and mutation. Crossover obtains new individuals by mixing, in some problem-dependent way, two existing members of the population (called parents). Mutation randomly changes some properties of an individual, such as functions or terminals found at a percentage of the nodes in a parse tree.
Conventional GP has several behavioural problems that limit its performance and scalability. Among them are the inability of its recombination operator, sub-tree crossover, to carry out meaningful recombination and the fact that it generates large chromosomes (programs) with a relatively high frequency.
The Crossover Operator
While the genetic operators in GP work directly on parse trees, a number of behavioural problems limit their application. One such problem is sub-tree crossover. In this operation, randomly chosen sub-trees from parent trees are exchanged, but the result is not a recombination of logical information.
Another issue is that GP tends to produce a large number of incomplete parses. This can slow down the convergence process and requires that an appropriate criterion be used to decide when the process should terminate.
The evolvable parser uses two different types of genetic operators, crossover and mutation. The former combines an existing parse with others present in the population to satisfy a grammar rule. The latter changes a parse by substituting its entire sequence of words for another sequence in the same syntax category. The choice of which to use depends on the type of evolvability that is required. We have explored the effects of conservative and speculative mutation, and of the length of the sequence of words that is parsed.
The Aging of Incomplete Individuals
During the evolution, some incomplete individuals may disappear from the population. The aging mechanism consists in replacing them by other complete parses that satisfy the grammar rules. This is done using the genetic operators of mutation and cut.
Mutation replaces a subtree of an individual with another one that parses the same sequence in a different way. This creates a new individual and it is added to the population. The cut operator substitutes an individual with a parse tree by randomly selecting some nodes from it. The new parse tree must be compatible with the ones that were selected before and it cannot conflict with other nodes of the individual.
Several experiments have been performed to design an evolutionary algorithm that works with bottom-up parsing. They have shown that the best results are obtained with a fitness function defined as the logarithm of the probability of the grammar rules and with a conservative crossover operator.