Looking Forward in Text Processing
with Python
 
A few weeks ago, I designed a tool that used sar and Sun explorer data for analysis. While searching through the data, I needed to grab text between two lines. This presented a problem as I had done this in Perl, but never in Python. The algorithm is basically the same, but Python has better list functionality (in my opinion) which makes the code easier to implement. Below is the module I wrote for this purpose.
 
import re
import os
import sys
 
#
def lookForward(searchPattern, endPattern, fileFD):
   '''lookForward takes 3 args: the beginning search pattern,
    second arg is the pattern to stop the search
   third arg is the file object. File must be opened prior to using the module.
   Returns a double list of search elements'''
 
   firstList = []
   returnValue = []
  
   for line in fileFD:
      #Replace spaces with commas.
      string = re.sub('\s+', ',', line)
      beginPattern = re.compile(searchPattern)
      beginMatch = re.search(beginPattern, string)
      finalPattern = re.compile(endPattern)
 
      #if there is a match for the beginning search pattern, then start parsing until endPattern is found.
      if beginMatch:
 
         #append the beginning search pattern line
         firstList.append(beginMatch.group(0))
 
         try:
            #endless loop until endPattern is found
            while True:
 
               #take next line in file
               newLine = fileFD.next()
 
               #replace spaces with commas
               noSpacesLine = re.sub('\s+', ',', newLine)
 
               #append each line to first list
               firstList.append(noSpacesLine)
 
               finalMatch = re.search(finalPattern, noSpacesLine)
 
               #check if newLine is a match for endPattern
               if finalMatch:
                  #append first list to the returnValue list
                  returnValue.append(firstList)
 
                  #null firstList so the data is not duplicated on next loop
                  firstList = []
 
                  #break the inner loop since endPattern was found
                  break
 
         except StopIteration:
            continue
 
   return(returnValue)