For my Discrete Mathematics II course at UCF (COT4210), I had to do some implementation with Finite State Machines. My favorite of our tasks (though the most difficult) was to convert a Regular Expression (RE) to an equivalent Deterministic Finite Automata (DFA). And since our professor let us use any language, I tried to branch out from Java & C (which are annoyingly overused in Academia). I decided to teach myself Python. And it turns out, it was a good choice too–considering it’s wonderful built-in functionality for Lists, and the heart of this program is a huge 2D array defining the automata’s transition function. Also, I miss scripting languages–especially when I’m writing a program as a learning experiment as opposed to trying to make it as efficient as possible.
So, without further Ado: here’s my code. It reads a RE in postfix notation from input.txt. Two cautions about postfix REs:
- You must explicitly state concatenation
- The Kleen Star is already a postfix operator in REs, so it doesn’t really work to use a mathematical infix2postfix library, as it treats the kleen star like an infix multiplicative operator. I treat it as an operand and throw it directly into the output rather than placing it on the stack since it’s already in postfix.
#!/usr/bin/python2 ################################################################################ # Author: Michael Altfield # Section: COT 4210-0001 (15902) Discrete Structures II T/R 10:30-11:45 # Created: 2011-03-24 # Updated: 2011-04-07 # Version: 1.2 # Assign#: Project #1, Problem #3 # Purpose: Given a regular expression R, construct a DFA that accepts the # language of R. ################################################################################ ################################################################################ # IMPORTS # ################################################################################ # n/a ################################################################################ # SETTINGS # ################################################################################ DEBUG = False # True enables DEBUG output. False disables. #DEBUG = True # input file describing our DFA inputFileName = 'three.txt' # all legal operators: concatenation, union, and kleen star # (everything else in the regular expression will be considered an input symbol) operators = [ '.', '+', '*' ] # set to False if you see ^[[1m and ^[[0;0m showing up in the output enableBoldOutput = True bold = "33[1m" reset = "33[0;0m" ################################################################################ # FUNCTIONS # ################################################################################ # prints debug info when DEBUG flag set to True def debug( msg ): if( DEBUG ): print "DEBUG: " + msg # returns a string representation of the Transition Table in a technical # human-moar-leik-engineer readable format def transitionTable2Str(): return str(transitionTable) # returns a string representation of the Transition Table in a technical # human-moar-leik-engineer readable format def curSubNfaList2Str(): return str(curSubNfaList) # returns a string representation of the NFA in a human readable format def automata2Str(): #################### # DEFINE VARIABLES # #################### states = '' sigma = '' delta = '' ############### # NFA or DFA? # ############### # do we have epsilon transitions in this Transition Table? print epsilon if len(transitionTable[0]) > epsilon: # we have epsilon transitions in this Transition Table (or at least we # have epsilon transitions to [None] in this Transition Table) printingNfa = True else: printingNfa = False ################################### # BUILD HUMAN-READABLE STATES SET # ################################### states += '{' # LOOP THOUGH ALL ROWS OF THE TRANSITION TABLE counter = 0 while counter < len(transitionTable): # is this our first state? if not states == '{': # this is not our first state, so prepend a comma & space states += ', ' # the name of each state is merely its index in the Transition Table, so # just append the counter to our set of states states += str(counter) counter += 1 states += '}' ##################################### # BUILD HUMAN-READABLE ALPHABET SET # ##################################### sigma += '{' # LOOP THOUGH ALL THE ELEMENTS (SYMBOLS) IN THE ALPHABET for symbol in alphabet: # is this our first symbol? if not sigma == '{': # this is not our first symbol, so prepend a comma & space sigma += ', ' sigma += symbol sigma += '}' ######################################### # BUILD HUMAN-READABLE TRANSITION TABLE # ######################################### delta = ' ' # PRINT TOP ROW OF TABLE (SYMBOLS IN ALPHABET) if enableBoldOutput: delta += bold # loop through all symbols in the alphabet for symbol in alphabet: delta += '{0:^10}'.format( str(symbol) ) + ' ' # are we printing an NFA? if printingNfa == True: # we are priting an NFA; create a column for epsilon transitions delta += '{0:^10}'.format( 'epsilon' ) + ' ' if enableBoldOutput: delta += reset delta += "n" # ITERATE THROUGH EACH FOLLOWING ROW (STATES) counter = 0 for row in transitionTable: # print the first element of the row (the transition-from state) if enableBoldOutput: delta += bold delta += '{0: | left "out" State | # ----------------- ------------------ # ------------------ ------------------- # | right "in" State | ----> | right "out" State | # ------------------ ------------------- # # into this: # ----- ------ ------ ------ # | LiS | ----> | LoS | ----> | RiS | ----> | RoS | # ------ ------ ------ ------ # def concatenateCurSubNfa(): ############################### # ADD ARC TO TRANSITION TABLE # ############################### # we concatenate 2 sub-NFAs by creating an arc from the "left" sub-NFA's # "out" State to the "right" sub-NFA's "in" State # -2 index = "left" sub-NFA # 1 subindex = "out" State leftExitState = curSubNfaList[-2][1] # -1 index = "right" sub-NFA # 0 subindex = "in" State rightEnterState = curSubNfaList[-1][0] debug( "Adding an epsilon transition from state " +str(leftExitState)+ " to state " +str(rightEnterState) ) transitionTable[ leftExitState ][ epsilon ] = [rightEnterState] debug( "tTransition Table = " ) debug( "tt" + transitionTable2Str() ) ##################################################### # MERGE SUB-NFAs TO SINGLE SUB-NFA IN curSubNfaList # ##################################################### debug( "Merging the 2 sub-NFAs in curSubNfaList" ) # -1 index = "right" sub-NFA # 1 subindex = "out" State outState = curSubNfaList[-1][1] # set the "left" sub-NFA's "out" State to be the "right" sub-NFA's "out" # State # -2 index = "left" sub-NFA # 1 subindex = "out" State curSubNfaList[-2][1] = outState # ..and delete the now-orphaned "right" sub-NFA curSubNfaList.pop() debug( "tcurSubNfaList = " ) debug( "tt" + curSubNfaList2Str() ) # turn this: # ---------------- ----------------- # | top "in" State | ----> | top "out" State | # ---------------- ----------------- # ------------------- -------------------- # | bottom "in" State | ----> | bottom "out" State | # ------------------- -------------------- # ------------------ # | new "left" State | # ------------------ # ------------------- # | new "right" State | # ------------------- # # into this: # e ----- ------ # --->| TiS | ----> | ToS | # ----- / ------ ------ e ----- # | NlS | ===> | NrS | # ----- e ----- ------ / e ----- # --->| BiS | ----> | BoS | # ------ ------ # def unionCurSubNfa(): ################################# # ADD 2 STATES TO OUR FINAL NFA # ################################# debug( "tTransition Table = " ) debug( "tt" + transitionTable2Str() ) # get the current number of States from the number of rows of the transition # table. Subtract 1 to start from 0. numStates = len( transitionTable ) - 1 newLeftState = numStates+1 newRightState = numStates+2 debug( 'Adding states ' +str(numStates+1)+' & ' +str(numStates+2)+ ' to the NFA' ) ############################################## # INSERT NEW LEFT STATE INTO TRANSTION TABLE # ############################################## # build a row to add to the Transition Table for our "left" State that has # an epsilon transition to each current sub-NFA's "in" State row = [ [None] for i in range( len(alphabet) ) ] # -2 index = "top" sub-NFA # 0 subindex = "in" State topInState = curSubNfaList[-2][0] # -1 index = "bottom" sub-NFA # 0 subindex = "in" State bottomInState = curSubNfaList[-1][0] debug( "Adding an epsilon transition from state " +str(newLeftState)+ " to " "states " +str(topInState)+ " & " +str(bottomInState) ) # add a epsilon transitions to both current sub-NFAs row.append( [topInState, bottomInState] ) transitionTable.append( row ) ############################################### # INSERT NEW RIGHT STATE INTO TRANSTION TABLE # ############################################### # build a row to add to the Transition Table for our "right" State with no # "out" transitions row = [ [None] for i in range( len(alphabet) ) ] # add a final epsilon transition row.append( [None] ) transitionTable.append( row ) ####################################################### # INSERT ARC TO NEW RIGHT STATE INTO TRANSITION TABLE # ####################################################### # -2 index = "top" sub-NFA # 1 subindex = "out" State topOutState = curSubNfaList[-2][1] # -1 index = "top" sub-NFA # 1 subindex = "out" State bottomOutState = curSubNfaList[-1][1] debug( "Adding epsilon transitions from states " +str(topOutState)+ " & " +str(bottomOutState)+ " to state " +str(newRightState) ) # add an epsilon transition from the top sub-NFA's "out" State to the new # "right" State transitionTable[ topOutState ][ epsilon ] = [newRightState] # add an epsilon transition from the top sub-NFA's "out" State to the new # "right" State transitionTable[ bottomOutState ][ epsilon ] = [newRightState] debug( "tTransition Table = " ) debug( "tt" + transitionTable2Str() ) ##################################################### # MERGE SUB-NFAs TO SINGLE SUB-NFA IN curSubNfaList # ##################################################### # "in" State of merged sub-NFA curSubNfaList[-2][0] = newLeftState # "out" State of merged sub-NFA curSubNfaList[-2][1] = newRightState # ..and delete the now-orphaned "bottom" sub-NFA curSubNfaList.pop() debug( "tcurSubNfaList = " ) debug( "tt" + curSubNfaList2Str() ) # turn this: # ---------------------- ----------------------- # | current "left" State | ----> | current "right" State | # ---------------------- ----------------------- # ------------------ # | new "left" State | # ------------------ # ------------------- # | new "right" State | # ------------------- # # into this: # ----- e ----- ----> ----- e ----- # | NlS | ----> | ClS | e | CrS | ----> | NrS | # ----- e ----- 0): # get the next token from the beginning of the regularExpression string token = regularExpressionList.pop(0) debug( "WORKING WITH TOKEN '" +str(token)+ "' FROM THE REGULAR EXPRESSION" ) if token in operators: # the token is an operation subNfaOperation( token ) else: # the token is a symbol createNfaFromSymbol( token ) ##################### # PRINT EPSILON-NFA # ##################### print "The following epsilon-NFA is equivalent to the supplied Regular " "Expression (" +str(regularExpression)+ "):" print automata2Str() ##################################### # CONVERT THE EPSIOLON-NFA TO A DFA # ##################################### debug( "---BEGIN CONVERTING EPSILON-NFA TO A DFA---" ) # DEFINE VARIABLES newTransitionTable = [] newStates = [] addedStates = [] # following the Subset Construction Algorithm, the first State of our DFA is the # Epsilon Closure of the first state of our epsilon-NFA startState = epsilonClosure( curSubNfaList[0][0] ) newStates.append( startState ) while len(newStates) > 0: newTransitionTable.append( [] ) state = newStates.pop(0) addedStates.append( state ) debug( "Adding State " +str(state)+ " to DFA with new State name = " + str(len(newTransitionTable)-1) ) # loop through all symbols in the alphabet + epsilon transitions counter = 0 while counter = len(alphabet): symbol = '' else: symbol = alphabet[counter] # has this state already been added to the new DFA? if not transitionTo == [] and not transitionTo == [None] and not transitionTo in newStates and not transitionTo in addedStates: # this state has not already been added to the new DFA; add it to the # newStates list so it'll be added in a future iteration newStates.append( transitionTo ) if transitionTo in addedStates: newStateIndex = addedStates.index( transitionTo ) elif transitionTo in newStates: newStateIndex = newStates.index(transitionTo) + len(newTransitionTable) debug( "Adding transition from State " +str(state)+ " = " +str(len(newTransitionTable)-1)+ " to State " +str(transitionTo) +" = " +str(newStateIndex)+ " on input symbol '" +str(symbol)+ "'" ) newTransitionTable[ len(newTransitionTable)-1 ].append( extendedDelta( state, counter ) ) counter += 1 debug( "tTransition Table = " ) debug( "tt" + transitionTable2Str() ) # overwrite the transitionTable with the new one now that we no longer need it # (automata2Str() uses transitionTable) transitionTable = newTransitionTable ############# # PRINT DFA # ############# print "The following DFA is equivalent to the above epsilon-NFA and the " "original Regular Expression (" +str(regularExpression)+ "):" print automata2Str() ################ # EXIT CLEANLY # ################ exit(0)
Of course, this work is mine and not yours. If you manage to get some better understanding of Discrete or Python from it, great–but you don’t have my permission to submit it as your own work.
Related Posts
Hi, I’m Michael Altfield. I write articles about opsec, privacy, and devops ➡
While I would like to comment on this post, it is nearly impossible without proper code formatting try using some pre tags and a javascript syntax highlighter such as http://alexgorbatchev.com/SyntaxHighlighter/
@Sam, I had something similar to this a while back, but a WordPress update broke it, and I haven’t yet had a chance to fix it. Google Code actually has something similar to this, which I have yet to implement.
Hi,where is the result of automata2Str?