Fer said:
11th exercises
This course will become read-only in the near future. Tell us at community.p2pu.org if that is a problem.
In computing, a regular expression, also referred to as regex or regexp, provides a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters.
The following examples illustrate a few specifications that could be expressed in a regular expression:
- The sequence of characters "car" appearing consecutively in any context, such as in "car", "cartoon", or "bicarbonate"
- The sequence of characters "car" occurring in that order with other characters between them, such as in "Icelander" or "chandler"
- The word "car" when it appears as an isolated word
- The word "car" when preceded by the word "blue" or "red"
- The word "car" when not preceded by the word "motor"
- A dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits (for example, "$100" or "$245.99").
Source: Wikipedia
Python regular expression syntax follows in the Perl lineage. The Python module re provides regular expression functionality. Regular Expressions are a sub-language embedded within the larger Python language.
Chapter 5 - regular expressions
Python.org
Please post regex excercises and questions below. We can help each other learn and explore this robust and slightly difficult aspect of Python.
The exercises of chapter 11 are here: http://pastebin.com/QM5eqJkh
My exercises:
@Link discussion:
I would not change the links.
I think in "Python for Informatics" you get a first impression of re.search() and re.findall().
But in "Dive Into Python 3" there are many details not listed in the first source of information.
For example to set the number of matches with {1, 3} and re.sub().
I've updated the link "Dive Into Python 3" since the old link was broken. It still points to a general page, not a expression-specific. I think it is better to link to http://getpython3.com/diveintopython3/regular-expressions.html.
Should this be changed? Or just delete the link and use one source of information (Python for Informatics) to keep it simple?
What do you think?
Exercise 11.1 Write a simple program to simulate the operation of the the grep command on
UNIX. Ask the user to enter a regular expression and count the number of lines that matched
the regular expression:
fhand=open('fis.txt')
a=raw_input("Please enter the expresion that you want to search in the file: ")
import re
count=0
for line in fhand:
line=line.rstrip()
if re.search(a, line):
count=count+1
print "we have ",count,"lines which contain",a
Exercise 11.2 Write a program to look for lines of the form
New Revision: 39772
And extract the number from each of the lines using a regular expression and the findall()
method. Compute the average of the numbers and print out the average.
import re
fhand=open('fis1.txt')
count=0
_sum=0
for line in fhand:
a=re.findall('^N.* R.*: ([0-9].*)',line)
count=count+1
_sum=_sum+float(a[0])
print "Avg=",_sum/count
Exercise 11.1
import re mbox = open('mbox.txt', 'r') def grep(expression, file): matches = 0 for line in file: line = line.rstrip() if re.search(expression, line): matches += 1 return matches def main(): """The main program loop""" expression = raw_input("Enter a regular expression: ") matches = grep(expression, mbox) print "mbox.txt had %d lines that matched %s" % (matches, expression) if __name__ == '__main__': main()
Exercise 11.2
import re def getAverage(filename): f = open(filename, 'r') matches = [] for line in f: matches.extend(re.findall('^New Revision: (\d+)', line)) fltmatches = [float(x) for x in matches] #convert values to floats average = sum(fltmatches) / len(fltmatches) return average def main(): """Main program loop""" filename = raw_input("Enter file: ") average = getAverage(filename) print average if __name__ == '__main__': main()
My py4int exercise 11 code
Exercise 11.1
import re fname = 'mbox.txt' rexp = raw_input('Enter a regular expression: ') try: fh=open(fname) except: print 'Unble to open', fname exit() match_count = 0 for line in fh : if re.search(rexp, line) : match_count = match_count + 1 print fname, 'had', match_count, 'lines that matched', rexp
Exercise 11.2
import re # Open file get the file name and open it fname = raw_input('Enter a file name: ') if 1 > len(fname) : fname = 'mbox-short.txt' try: fh=open(fname) except: print 'Unble to open', fname exit() rev_list = [] # Read each line and find out message count for each email address for line in fh : rev = re.findall('^New Revision: (\d+)', line) if 1 > len(rev) : continue rev_list = rev_list + [int(rev[0])] print 'Average Revision:', sum(rev_list)/len(rev_list)
There is a small error in exercise 11.1, the last example says:
mbox.txt had 4218 lines that matched java$
But it should be:
mbox.txt had 4175 lines that matched java$
The official documentation for regular expressions in python is at:
http://docs.python.org/library/re.html
11.1
sre = raw_input('Enter a regular expression: ') fname = 'mbox.txt' try: fhand = open(fname) except: print 'File cannot be opened:', fname exit() lm = 0 for line in fhand: n = re.findall(sre, line) if len(n) > 0: lm = lm + 1 fhand.close() print fname, 'had', lm, 'lines that matched', sre
11.2
fname = raw_input('Enter a file name: ') try: fhand = open(fname) except: print 'File cannot be opened:', fname exit() sum = 0 num = 0 for line in fhand: n = re.findall(r'\s*New\s+Revision:\s*([0-9]+)', line) for i in n: sum = sum + int(i) num = num + 1 fhand.close() if num > 0: print 'Average', float(sum)/num else: print 'Not lines with "New Revision:" found in', fname
Are there exercises to be released? or are we as a team intended to build our own?
Would you show me where to find the mbox.txt file and mbox-short.txt file?
Thanks,
Nate
Sure -
mbox: http://www.py4inf.com/code/mbox.txt
mbox-short: http://www.py4inf.com/code/mbox-short.txt
All of the code in the book can be found on the index list at: http://www.py4inf.com/code/