
can be described by . Thus, these form the basis for all languages for specifying the lexical analysis. Regular languages can be recognized by finite automata. These terms will now be introduced. In what follows, we shall always assume an underlying alphabet .
In order to be recognized automatically, words of a symbol class have to be described formally. Now we will try to capture the structure of a symbol class formally. As an example, we will examine the symbol class of . Identifiers are formed by a concatenation of letters and digits, where the first character must be a letter.
How can this structure be represented?
a LETTER, then any number of (a LETTER or a DIGIT)
or shorter:
le, then any number of ( le or di )
This informal notation can be represented formally as follows:
So the expression now looks like this: le ( le  di )*. This is a .
Strictly speaking, le and di are themselves :
So the for identifiers is composed of already existing regular expressions by applying CONCATENATION, UNION, and KLEENESTAR to these already existing regular expressions.
Therefore, , and the they describe, are defined inductively:
Regular expression  Regular language  
Basis: 
a Ø 
describes  {a} {} Ø 
Inductive hypothesis:  r_{1
} r_{2} 
describes  R_{1
} R_{2} 
Inductive step:  (r_{1 }r_{2}) (r_{1 } r_{2}) (r_{1})* 
describes  R_{1}R_{2} R_{1}R_{2} R_{1}* 
There are no other regular expressions. 
Remarks on the notation:
Examples for regular expressions and languages:
Regular expression  Regular language described  Elements of the regular language 
ab  {a, b}  a,b 
ab*a  {a}{b}*{a}  aa, aba, abba, Ø 
(ab)*  {ab}*  , ab, abab, Ø 
abba  {abba}  abba 
Copyright © (1999  2001) University of Saarland, Germany
