Automata Theory
Automata theory is a branch of Computer Science that deals with mathematical models of computation
- Finite Automaton
- Deterministic Finite Automaton (DFA)
- Non-deterministic Finite Automaton (NFA)
- Generalised Non-deterministic Finite Automaton (GNFA)
- Finite State Transducer (FST)
- Converting an NFA into a DFA
- Converting an DFA into a GNFA
- Converting a GNFA into a Regular Expression
- Converting a Regular Expression into an NFA
- Converting a DFA into a Regular Expression
- Regular Expressions
- Words
- Regular Languages
- References
Finite Automaton
An abstract machine that can be in exactly one of a finite number of states at any given time. States are connected by transitions, which may be cyclic.
If is the set of all inputs that machine accepts, we say that is the language of machine and conversely that recognises . This is represented as .
Notice that a machine recognises one language and one language only. Even if the machine accepts no input, then its language is . Two machines are equivalent if the recognise the same language.
For any set of states , is the set of all states that can be reached by going only through transitions, including . Consider the following finite automaton:
+-------+
b | | a, b
+---------->| B |----------+
@@ | | | |
@@ | +-------+ |
| | v
| ######### +-------+
| # # empty-transition | |
+--># A # -------------------->| C |
# # | |
######### +-------+
^ |
| a |
+------------------------------+
In this case, . Notice that because is in the argument set, and from we can go to using a transition. Similarly, because there is no transition from .
Deterministic Finite Automaton (DFA)
A finite automaton where each state has unique transitions for any symbol of the alphabet. A DFA does not allow transitions.
It is formally defined as a 5-tuple where is a finite set of states along with an error state , is the set denoting the alphabet, is the transition function , is the start state such that and is the set of accept states such that .
The transition function maps a state and an alphabet symbol to one next state (thus its deterministic). This function is total, and maps unspecified transitions to the error state.
We say that a DFA accepts a sequence of symbols if there exists a sequence of states such that (the first state is the start state), (the final state is an accept state), and (each transition is valid). A machine accepts the word if its initial state is an accept state: .
Consider the following DFA that accepts binary numbers that are multiples of 3:
1 0
@@ +------------+ +-----------+
@@------+ | | | |
v | v | v
######## +------+ +------+
+-----# # | | | | -----+
0| # S1 # | S2 | | S3 | |1
+----># # | | | | <----+
######## +------+ +------+
^ | ^ |
| | | |
+------------+ +-----------+
1 0
This DFA would be formally denoted as , where is .
Intersection
Given machines and , the intersection of both machines is the machine that accepts an input if both or do.
This is accomplished with the same process we use to convert a DFA into an NFA with the difference that the final accept states will be the cartesian product of and accept states instead of the members of the resulting states that contain an accept state from either or .
Non-deterministic Finite Automaton (NFA)
A finite automaton that allows more than one transition on the same input symbol, and that also allows transitions, which can be fulfilled by any symbol without consuming the input.
Basically, given a state and an input symbol, there can be more than one valid outbound edge for such symbol, constructing a tree. An NFA accepts a sequence of symbols if at least one of the possible paths accepts such string.
Nondeterminism is a generalization of determinism, so every DFA is automatically an NFA. Conversely, every NFA can be converted into a DFA.
An NFA is formally defined as a 5-tuple where is a finite set of states, is the set denoting the alphabet, is the transition function , is the start state such that and is the set of accept states such that .
Notice that this 5-tuple is similar to the one from a DFA, with the following exceptions:
- The set of states no longer requires a error state. Invalid transitions can return
- The transition function may accept as a valid symbol in the alphabet, denoted as
- The transition function returns a set of possible next states instead of just one state
Any NFA can be converted to an NFA with just one accept state by creating a new state that is the only accept state, and creating transitions from all previous accept states to the new accept state.
Union
Given machines and , the union of both machines is the machine that accepts an input if either or do. This is accomplished by creating a new start state that has transitions to the old start states.
@@
@@ +------+ ########
| | | # #
+---->| p1 |------># p2 #
| | # #
+------+ ########
@@
@@ +------+ ########
| | | # #
+---->| q1 |------># q2 #
| | # #
+------+ ########
+------+ ########
empty-transition | | # #
@@ +------------------->| p1 |----># p2 #
@@ +------+ | | | # #
| | | | +------+ ########
+---->| n0 |---+
| | | +------+ ########
+------+ | empty-transition | | # #
+------------------->| q1 |----># q2 #
| | # #
+------+ ########
Concatenation
Given machines and , the concatenation of both machines is the machine that accepts an input if accepts a prefix of it, and accepts the rest. This is accomplished by creating transittions between ’s accept states and start state, making ’s start state the only start state, and making ’s accept states the only accept states.
######## ########
# # # #
@@ +--># p2 # @@ +--># q2 #
@@ +------+ | # # @@ +------+ | # #
| | | | ######## | | | | ########
+---->| p1 |---+ +---->| q1 |---+
| | | ######## | | | ########
+------+ | # # +------+ | # #
+--># p3 # +--># q3 #
# # # #
######## ########
######## ########
# # empty-transition # #
@@ +--># p2 #--------------------+ +--># q2 #
@@ +------+ | # # | +------+ | # #
| | | | ######## | | | | ########
+---->| p1 |---+ +---->| q1 |---+
| | | ######## | | | | ########
+------+ | # # empty-transition | +------+ | # #
+--># p3 #--------------------+ +--># q3 #
# # # #
######## ########
Kleene Star
Given machine , is the machine that accepts zero or more iterations of . This is accomplished by creating a new start state that is an accept state (so that the word is accepted, as it is always a member of the star set), and has a transition to the old start state, and creating transitions from previous accept states back to the old start state.
########
# #
@@ +--># p2 #
@@ +------+ | # #
| | | | ########
+---->| p1 |---+
| | | ########
+------+ | # #
+--># p3 #
# #
########
empty-transition
+--------------------+
| |
| ######## |
| # # |
@@ v +--># p2 #--+
@@ ######## +------+ | # #
| # # empty-transition | | | ########
+----># s0 #----------------->| p1 |---+
# # | | | ########
######## +------+ | # #
^ +--># p3 #--+
| # # |
| ######## |
| |
+--------------------+
empty-transition
Generalised Non-deterministic Finite Automaton (GNFA)
A GNFA is a NFA where the transitions can be regular expressions instead of just words from an alphabet, or . Therefore, a GNFA reads blocks of symbols from the input, and not necessarily just one symbol at a time like other NFAs.
A GNFA is formally defined as a 5-tuple where is a finite set of states, is the set denoting the alphabet, is the transition function , where is the set of all regular expressions over , is the start state such that and is the set of accept states such that .
Notice that given and , means that the transition between and is the regular expression .
For convenience when converting GNFAs to regular expressions, GNFAs always have a special form with the following conditions:
The start state has transition arrows going to every other state but no arrows coming in from any state other than itself
There is only a single accept state, and it has arrows coming in from every other state but no arrows going to any other state
The start state is not the same as the accept state
Except for the start and accept states, one transition goes from every state to every other state and also from each state to itself
Notice that we if assume this special form, then the domain of the transition function is . This is because we don’t allow transitions from the accept state to other states, nor transitions going to the start state.
Every GNFA is equivalent to a GNFA with only two states with a transition that includes a regular expression that abstracts away any other states and transitions.
For example, given :
+---------------------------------------+
| b |
| v
| ########
| ab U ba # #
| +----------------------># s3 #
| | # #
+------+ | ########
@@ | | | empty-set ^
@@---->| s0 |--------+------------------+ |
| | | | |
+------+ | v |b*
| +------+ +------+ |
| ab* | | a* | | |
+------>| s1 |---------->| s2 |-----+
+---->| | | |<---+
| +------+ +------+ |
| aa | ^ | | ab|
+-----+ | (aa)* | +-----+
+------------------+
A GNFA accepts a word if it can be partitioned into a set of words where each partition is a member of and there exists a sequence of states such that is the start state, is an accept state (the only one in the case of a GNFA in the special form), and that where is a member of the language such regular expression represents: .
Basically, there is a sequence of states and regular expression transitions that consumes the input in chunks, in order, in a way that such computation ends in an accept state.
Finite State Transducer (FST)
A type of DFA where the output is a word rather than a boolean accept or reject result. This DFA has no accept states, and deals with two different alphabets. A FST translates a word from an alphabet to another alphabet.
A FST is formally defined as a 5-tuple where is a finite set of states, is the input alphabet, is the output alphabet, is the transition function , and is the start state such that .
Basically, the transition function takes a state and a symbol from the input alphabet, and returns the next state along with the translated symbol from the output alphabet.
For example, given and :
b/1
+-------------------------------+
| |
+---------------+ |
| a/1 | |
@@ | v v
@@ +------+ +------+ +------+
| | | | | a/1 | |
+---->| s1 | | s2 |------->| s3 |
| | | | | |
+------+ +------+ +------+
^ | ^ |
| b/0 | | b/1 |
+-------------+ +--------------+
| |
+-------------------------------+
a/0
The transition notation
between
and
means that moving between those state takes
from the input alphabet, and will result in
from the output alphabet. Given the above example, we can
translate aabb
to 1110
. The transition
function is defined as something like this:
.
Converting an NFA into a DFA
Given an NFA with states , then its corresponding DFA will have as states (notice it always includes ). Because the DFA states are the power set of the NFA sets, then given an NFA with states, then its DFA will have states. Notice that we don’t need the error state in this case since the state will serve that purpose.
The DFA alphabet is the same as in the NFA, leaving aside the symbol.
In order to calculate the new transition function, we must go through each of the sets in , and for each of those, go through the possible alphabet symbols (excluding ), and return the union of all the valid transitions from each of the states. We can take transitions and return the resulting states as well, but only after consuming the input symbol.
Given the start state of the NFA is , the start state of the DFA is equal to .
The DFA accept states are all the members of that contain at least one of the NFA accept states.
Finally, we can try to simplify the DFA by discarding states that only have outbound transitions, which means no path can lead into them.
Consider the following NFA:
a
+---+
| |
| v
+-----+
b | | a, b
+--------->| B |---------+
| | | |
@@ | +-----+ v
@@ ####### +-----+
| # # empty-transition | |
+----># A #------------------->| C |
# # | |
####### +-----+
^ |
| a |
+--------------------------+
If we build its corresponding DFA, then its states are , the alphabet is , the start state is , and the accept states are .
The transition function looks like this:
a | b |
---|---|
Notice . From , we can take the transition to , and from there we can take back to , so we count as well.
Also notice that . You might expect that we can follow to , and from there take back to , but transitions can only happen after the input symbol was consumed.
The resulting DFA looks like this:
+--------------------------------------------------------+
| a |
| |
####### ######### |
# # # # +--------+ |
# {A} # # {A,B} #-------+ | a | |
# # # # a | | v |
####### ######### v | ######### |
| | +-------+ | # # |
+------b-----+ | b | | | +--#{A,B,C}# |
| +------>| {B,C} |---+ | # # |
v | | | ######### |
+-----+ +-------+ | | ^ |
| | | ^ | | a | |
+-------->| {B} |---------+ | | | +-----+ |
| b | | b | | +-------+ |
| +-----+ | | b +------+ |
######### | |b | | |
@@ # # v | +------->| {} |<-+
@@---># {A,C} #<-------+ +-----+ | | +----| |----+
# # | a | | | | | +------+ |
######### +---------| {C} |<+ | | ^ ^ |
| ^ | | | | a | | b |
| a | +-----+ | +------+ +------+
+-----+ | |
| b |
+-----------+
Removing states with only outbound transitions results in:
+--------+
| a |
| v
| #########
+-------+ | # #
a | | | +--#{A,B,C}#
+------------>| {B,C} |---+ | # #
| | | | #########
+-----+ +-------+ | | ^
| | | ^ | | a |
+-------->| {B} |---------+ | | | +-----+
| b | | b | | +-------+
| +-----+ | | b +------+
######### | |b | |
@@ # # v | +------->| {} |
@@---># {A,C} #<-------+ +-----+ | | +----| |----+
# # | a | | | | | +------+ |
######### +---------| {C} |<+ | | ^ ^ |
| ^ | | | | a | | b |
| a | +-----+ | +------+ +------+
+-----+ | |
| b |
+-----------+
Converting an DFA into a GNFA
Add a new start state with an transition to the old start state, then add one single new accept state and create transitions to it from all old accept states. At this point, if any transitions have multiple labels or there is more than one transition in the same directory between two states, replace them with a single transition whose label is the union of the previous labels. Finally, add transitions between all states (except the start and final state).
Converting a GNFA into a Regular Expression
This conversion algorithm assumes the GNFA is the special form. A GNFA in this form has states, since it always has a start and acceptt state that are ensured to be different. The idea is that while , we can pick any state that is not the start or the accept state, remove it, and “fix” the broken transitions by abstracting the removed computation as regular expressions. Once , the regular expression in the transition between the start and the final state is the result.
Consider the following automata:
+------+ +------+
| | R1 | |
| s1 |--------------->| s2 |
| | | |
+------+ +------+
| +------+ ^
| R2 | | R3 |
+------>| s3 |--------+
| |
+------+
| ^
| R4 |
+----+
We will remove . The computation between and that went through can be summarized as: , then zero or more , and finally . In regular expression terms, this would be , so:
+------+ +------+
| | R1 | |
| s1 |--------------->| s2 |
| | | |
+------+ +------+
| ^
| R2 R4* R3 |
+-----------------------+
Now we have two transitions from the same states, which we can collapse using the union operator: .
Notice that because of the pumping lemma, we can represent any regular language as , and therefore any sub-tree of a finite automata. Thus, the removal of any state can be resolved with the concatenation of, the transition going to the removed state (), is the transition going from the removed state to itself (), and the transition going from the removed state to the next state. Finally, we calculate the union of this resulting expression and any other transition between the state that goes to the removed state to the state after the removed state.
Formally, assume and where and . The resulting GNFA with is , where and for any and any , where , , , and .
Converting a Regular Expression into an NFA
Consider the regular expression . We will use the 6-clause definition of a regular expression, and discuss them in order.
If where , then the language of the regular expression is the set containing just the symbol: . The resulting NFA is then where and :
+------+ ########
@@ | | a # #
@@----->| s0 |-----># s1 #
| | # #
+------+ ########
If , then , so the resulting NFA is where :
########
@@ # #
@@-----> # s0 #
# #
########
If , then , so the resulting NFA is where :
+------+
@@ | |
@@-----> | s0 |
| |
+------+
For the last 3 cases, where is the union or concatenation between other regular expressions, and where is the Kleene star of a regular expression, we convert each of the operands to NFAs using the definitions described before, and then use the standard way to represent union, concatenation, or stars of NFAs (consult the Operations section).
Consider as a complete example. First, lets convert and into NFAs:
+------+ ########
@@ | | a # #
@@----->| s0 |-----># s1 #
| | # #
+------+ ########
+------+ ########
@@ | | b # #
@@----->| s2 |-----># s3 #
| | # #
+------+ ########
The expression is the concatenation of both previous expressions, which is:
+------+ +------+ +------+ ########
@@ | | a | | empty-transition | | b # #
@@----->| s0 |----->| s1 |----------------->| s2 |-----># s3 #
| | | | | | # #
+------+ +------+ +------+ ########
The expression is the union of the previous NFA and ’s NFA:
+------+ +------+
empty-transition | | a | | empty-transition
+----------->| s1 |----->| s2 |------+
| | | | | | +------+ ########
+------+ +------+ +------+ | | | b # #
@@ | | +----->| s3 |-----># s4 #
@@-->| s0 | | | # #
| | +------+ ########
+------+ +------+ ########
| | | a # #
+----------->| s5 |-----># s6 #
empty-transition | | # #
+------+ ########
Finally, we calculate the Kleene star of the whole previous NFA:
+------+ +------+
empty-transition | | a | | empty-transition
+---->| s2 |------>| s3 |------+
| | | | | | +------+
@@ | +------+ +------+ | | |
@@-+ | +----->| s4 |
v | | |
+------+ +------+ +------+ ######## +------+
| | empty-transition | |empty-transition| | a # # |
| s0 |----------------->| s1 |--------------->| s6 |----># s7 # | b
| | | | | | # # v
+------+ +------+ +------+ ######## ########
^ | # #
| | # s5 #
+------------------------------------+ # #
| empty-transition ########
| |
| |
+---------------------------------------------+
empty-transition
Converting a DFA into a Regular Expression
We have to first convert the DFA into a GNFA, and then convert the GNFA into a regular expression, using the mechanisms described before.
Regular Expressions
is a regular expression if is the language containing one symbol from an alphabet , , , the union or concatenation between two regular expressions, or the Kleene star of a regular expression.
Notice the regular expression denotes the language containing just the empty string, while denotes the language that does not contain any string.
Any regular expression can be converted into a finite automaton that recognizes the language it describes, and vice versa. If a language can be described by a regular expression, then it is a regular language.
Words
A word is a member of language, and consists of symbols that are part of an alphabet.
Cardinality
Given a word from a language , the “length” of is denoted as . Given and , .
Reverse
Given a word , the reverse of the word is equals .
Regular Languages
A language is called a regular language if and only if there exists a non-deterministic finite automaton that recognises it.
Union
Given two regular languages and , the union of both languages is represented as , and it gives us a language containing all the words for both and . The union of two regular languages is a regular language.
For example, given and , then .
For convenience, given words and , is a shorthand for .
Notice , but might not always equal .
Concatenation
Given two regular languages and , the concatenation of both languages is represented as , and it gives us a language containing all possible combinations of words in before words in , and viceversa, like a cartesian product. The concatenation of two regular languages is a regular language.
For example, given and , then .
For convenience, given words and , both and are shorthands for .
Concatenating any set to the empty set yields the empty set: . Also , but might not always equal .
Kleene Star
This is a unary operation that given a language , equals an infinite set of all the possible combinations of words in . It is defined as . For any language, is a member of the star result. The Kleene star of a regular language is a regular language. Notice that .
For example, given , .
For convenience, given a word , is a shorthand for .
Given alphabet , is the language consisting of all strings over such alphabet.
As an extension to the Kleene star operator, we can define . Also, . Notice that given a language , .
Notice that . This means “nothing” zero or more times. The zero part of it comes down to . Nothing two times is , which is .
Reverse
The reverse of a language is the language with all its words reserved. Given , the reverse version of the language is . If is regular, so is .