SKEDSOFT

Automata | Comp. Sc. Engg.

An Illustration of homomorphisms: In parsing a programming language such as C, the parser appeals to a tokenizer that first recognizes the structure of tokens (sometimes called lexemes) such as integers, floating-point numbers, variable declarations, keywords, and such. The tokenizer pattern-matches according to regular expressions, while the parser analyzes the structure of the token stream using a context free grammar (context free grammars are discussed in Chapters 13 and 14; basically, they capture the essential lexical structure of most programming languages, such as nested brackets, begin/end blocks, etc.). Suppose one wants to get rid of the tokenizer and write a context free grammar up to the level of individual characters.

Such a grammar can be obtained by substituting the character stream corresponding to each token in place of each token in a modularfashion, according to a homomorphism. For example, if begin were to be a keyword in the language, instead of treating it as a token keyword, one would introduce additional productions of the form

begin -> b e g i n

Thanks to the modular nature of the substitutions, it can be shown that the resulting grammar would also be context-free.

Prefix-closure: A language L is said to be prefix-closed if for every string x ∈ L, every prefix of x is also in L. If we are interested in every run of a machine, then its language will be prefix-closed. This is because a physically realizable string processing machine must encounter substrings of a string before it encounters the whole string. Prefix closure is an operation that ‘closes’ a set by putting in it all the prefixes of the strings in the set.

For instance, given the set of strings {a, aab, abb}, its prefix closure is

{ε, a, aa, ab, aab, abb}.