Task Discussion

oskoff said:

Hi all,

I'm stuck on exercises 8.2. I'm guessing to properly guard it is to ensure that words[2] is one of the days of the week. Is this correct and if so how do you correctly guard it? I have tried the following, yet results in only blank lines:

#8.2.py

fhand = open('mbox.txt')

count = 0

week = ["Sat", "Sun", "Mon", "Tue", "Wed", "Fri"]

for line in fhand:

words = line.split()

if len(words) == 0 : continue

if words[0] != 'From' : continue

if words[2] != week[:] : continue

print words[2]

Thanks for your help.

on Sept. 9, 2011, 7:13 a.m.
Mars83 said:
I study the same topic now. I think it first has to be guarded to handle lines only containing two words.

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008 [...] From xyz [...]

So you have more than zero words and it will pass the guardian. But when printing 'words[2]' (= two-eth / the 3rd word) there will be an IndexError.

So I edited the line

if len(words) == 0 : continue

to

if len(words) <= 2 : continue

Now lines with less than three words are uninteresting. You also could fetch errors by surrounding the print statement with:

try: # Additional guard print(words[2]) except IndexError: continue

To check, if the word matches your list you could use following code:

if words[2] not in week[:]: continue
on Oct. 7, 2011, 5:53 p.m. in reply to oskoff
oskoff said:

Thanks Mars83, adding your changes worked perfectly!

on Oct. 13, 2011, 11:22 a.m. in reply to Mars83

Sudaraka Wijesinghe said:

My py4int exercise 10 code

Exercise 10.1

# Open file get the file name and open it
fname = raw_input('Enter a file name: ')
try:
    fh=open(fname)
except:
    print 'Unble to open', fname
    exit()

email_count = {}

# Read each line and find out message count for each email address
for line in fh :
    if not line.startswith('From ') : continue
    
    line_words = line.split()
    email_count[line_words[1]] = email_count.get(line_words[1], 0) + 1

# Find the email address with most messages
sorted_email_count = []
for email, count in email_count.items() :
    sorted_email_count = sorted_email_count + [(count, email)]

#sorted_email_count
sorted_email_count.sort(reverse=True)

# Display the result
print sorted_email_count[0][1], sorted_email_count[0][0]

Exercise 10.2

# Open file get the file name and open it
fname = raw_input('Enter a file name: ')
try:
    fh=open(fname)
except:
    print 'Unble to open', fname
    exit()

hour_count = {}

# Read each line and find out message count for each email address
for line in fh :
    if not line.startswith('From ') : continue
    
    line_words = line.split()
    time=line_words[5].split(':')
    hour_count[time[0]] = hour_count.get(time[0], 0) + 1

# Find the hour with most messages
sorted_hour_count = hour_count.items()

#sorted_email_count
sorted_hour_count.sort()

# Display the result
for hour, count in sorted_hour_count :
    print hour, count

Exercise 10.3

# Open file get the file name and open it
fname = raw_input('Enter a file name: ')
if 1 > len(fname) :
    fname = 'clown.txt'

try:
    fh=open(fname)
except:
    print 'Unble to open', fname
    exit()

letter_count = {}

# Read the entire file and split into letters
#letter_list = list(fh.read())
# Or, if need to filter out spaces and special characters
text = fh.read()
words = text.split()
letter_list = []
for word in words :
    letter_list = letter_list + list(word)

# Count frequency
for letter in letter_list :
    letter_count[letter] = letter_count.get(letter, 0) + 1

sorted_letter_count = []
for letter, count in letter_count.items() :
    sorted_letter_count = sorted_letter_count + [(count, letter)]

sorted_letter_count.sort(reverse=True)

for count, letter in sorted_letter_count :
    print letter,

on July 18, 2011, 3:16 a.m.

Sudaraka Wijesinghe said:

My py4int exercise 9 code

Exercise 9.3

# Open file get the file name and open it
fname = raw_input('Enter a file name: ')
try:
    fh=open(fname)
except:
    print 'Unble to open', fname
    exit()

day_count = {}

# Read each line
for line in fh :
    if not line.startswith('From ') : continue
    
    line_words = line.split()
    day_count[line_words[2]] = day_count.get(line_words[2], 0) + 1

    
# Display the day count
print day_count

Exercise 9.4

# Open file get the file name and open it
fname = raw_input('Enter a file name: ')
try:
    fh=open(fname)
except:
    print 'Unble to open', fname
    exit()

email_count = {}

# Read each line and find out message count for each email address
for line in fh :
    if not line.startswith('From ') : continue
    
    line_words = line.split()
    email_count[line_words[1]] = email_count.get(line_words[1], 0) + 1

# Find the email address with most messages
max_email = None
max_count = None
for email, count in email_count.items() :
    if max_count is None or max_count < count :
        max_email = email
        max_count = count
        
# Display the result
print max_email, max_count

Exercise 9.5

# Open file get the file name and open it
fname = raw_input('Enter a file name: ')
try:
    fh=open(fname)
except:
    print 'Unble to open', fname
    exit()

domain_count = {}

# Read each line
for line in fh :
    if not line.startswith('From ') : continue
    
    line_words = line.split()
    email = line_words[1].split('@')
    domain_count[email[1]] = domain_count.get(email[1], 0) + 1

    
# Display the domain count
print domain_count

on July 18, 2011, 3:13 a.m.

Sudaraka Wijesinghe said:

My py4int exercise 8 code

Exercise 8.4

# Open file get the file name and open it
fname = raw_input('Enter a file name: ')
try:
    fh=open(fname)
except:
    print 'Unble to open', fname
    exit()

# Store unique words from the file
collected_words = []

# Read each line
for line in fh :
    line_words = line.split()

    # Read each word
    for word in line_words :
        # Check if in the collected list, if not add it
        if not word in collected_words :
            collected_words.append(word)

# Sort the collected word list
collected_words.sort()

# Display the sorted words in our collection
print collected_words

Exercise 8.5

# Open file get the file name and open it
fname = raw_input('Enter a file name: ')
try:
    fh=open(fname)
except:
    print 'Unble to open', fname
    exit()

line_count = 0

# Read each line
for line in fh :
    if not line.startswith('From ') : continue
    
    line_words = line.split()
    line_count = line_count + 1

    # Display the from address
    print line_words[1]
    
# Display the line count
print 'There were', line_count, 'lines in the file with From as the first word'

Exercise 8.6

# Initialize variables
collected_numbers = []

while 1 :
    inp = raw_input ('Enter a number: ')
    if 'done' == inp :
        break

    try:
        collected_numbers = collected_numbers + [float(inp)]
    except:
        print 'Invalid input'

print 'Maximum:', max(collected_numbers)
print 'Minimum:', min(collected_numbers)

on July 18, 2011, 3:09 a.m.

Sudaraka Wijesinghe said:

My py4int exercise 6 code

Exercise 6.5

str = 'X-DSPAM-Confidence: 0.8475'

number_start = str.find(':')
if -1 < number_start :
    number_str = str[number_start+1:].strip()
    
    try:
        number_float = float(number_str)
    except:
        print 'Failed to convert the found string to float'
        quit()

    print 'Number found:', number_float
    
else:
    print 'Unable to detect number starting position'

on July 18, 2011, 2:58 a.m.

Vladimir Támara Patiño said:

10.1

fhand = open('mbox-short.txt')
d = {}
for line in fhand:
    words = line.split()
    if len(words) >= 2 and words[0] == 'From': 
        d[words[1]] = d.get(words[1], 0) + 1
fhand.close()
lt = []
for p in d:
    lt.append((d[p], p))
lt.sort(reverse = True)
if len(lt) > 0:
    (c, p) = lt[0]
    print "Person with the most commits: ", p, " with ", c, " commits"

10.2

fhand = open('mbox-short.txt')
d = {}
for line in fhand:
    words = line.split()
    if len(words) >= 6 and words[0] == 'From': 
        h = words[5].split(':')
        d[h[0]] = d.get(h[0], 0) + 1
fhand.close()
lt = d.items()
lt.sort()
for (h, f) in lt:
    print h+" "+str(f)

10.3

def most_frequent(s):
    f = {}
    for i in s:
        f[i] = f.get(i, 0) + 1
    lt = []
    for i in f:
        lt.append((f[i], i))
    lt.sort(reverse = True)
    for (r, i) in lt:
        print i+" "+str(r)

a = raw_input('File to use? ')
fhand = open(a)
s = ''
for line in fhand:
    s = s + " " + line.lower()
fhand.close()
most_frequent(s)

I ran this one with part of http://www.gutenberg.org/cache/epub/12501/pg12501.txt (after recoding it to latin1). The top ten letters are:

e 6741
a 5394
o 4986
s 4452
n 3630
r 3103
l 2927
d 2652
i 2551
u 2321

According to http://en.wikipedia.org/wiki/Letter_frequencies the most frequent letters in spanish are "eaosr nidlc". So we obtained a very similar result.

on June 9, 2011, 11:47 p.m.

Vladimir Támara Patiño said:

Link to dict "official" documentation http://docs.python.org/library/stdtypes.html#mapping-types-dict

9.1

def file2dict():
    d = dict()
    fhand = open('words.txt')
    for line in fhand:
        words = line.split()
        for word in words:
            d[word] = word
    return d

d = file2dict()
if 'live' in d:
    print "live appears"
else:
    print "live doesn't appears"

9.2

def histogram(s):
    d = dict()
    for c in s:
        d[c] = d.get(c, 0) + 1
    return d

print histogram('brontosaurus')

9.3

fhand = open('mbox-short.txt')
count = 0
d = {}
for line in fhand:
    words = line.split()
    if len(words) >= 3 and words[0] == 'From':
        d[words[2]] = d.get(words[2], 0) + 1

print d

9.4

fn = raw_input('Enter a file name: ')
fhand = open(fn)
count = 0
d = {}
for line in fhand:
    words = line.split()
    if len(words) >= 2 and words[0] == 'From':
        d[words[1]] = d.get(words[1], 0) + 1

indmax = None
for v in d:
    if indmax == None or d[v] > d[indmax]:
        indmax = v

if indmax != None:
    print indmax, d[indmax]

9.5

fn = raw_input('Enter a file name: ')
fhand = open(fn)
count = 0
d = {}
for line in fhand:
    words = line.split()
    if len(words) >= 2 and words[0] == 'From' and '@' in words[1]:
        dom = words[1][words[1].find('@')+1:]
        d[dom] = d.get(dom, 0) + 1
print d

on June 8, 2011, 7:08 a.m.

Vladimir Támara Patiño said:

I didn't find problems in chapter 8. Regarding methods and standard functions over lists we can check: http://docs.python.org/tutorial/datastructures.html#more-on-lists

print "8.1"
def chop(t):
    if len(t) >= 1:
        del t[0];
    if len(t) >= 1:
        t.pop();
    return None

def middle(t):
    n = list(t)
    chop(n)
    return n

print "8.2"
print "The initial program fails for example if mbox-short.txt is the line:"
print "From me"
fhand = open('mbox-short.txt')
count = 0
for line in fhand:
    words = line.split()
    print 'Debug:', words
    if len(words) < 3 : continue
    if words[0] != 'From' : continue
    print words[2]

print "8.3"
fhand = open('mbox-short.txt')
count = 0
for line in fhand:
    words = line.split()
    print 'Debug:', words
    if len(words) >= 3 and words[0] == 'From':
        print words[2]

print "8.4"
fhand = open('romeo.txt')
r = []
for line in fhand:
    words = line.split()
    for w in words:
        if not w in r:
            r.append(w)
r.sort()
print r

print "8.5"
fname = raw_input("Enter a file name: ");
fhand = open(fname)
countfromp = 0
for line in fhand:
    words = line.split()
    if len(words) > 0 and words[0] == 'From':
        countfromp = countfromp + 1
        if (len(words) >= 2):
            print words[1]
print "There were ", countfromp, "lines in the file with From as the first word"


print "8.6"
t = []
while True:
    n = raw_input("Enter a number: ")
    if n == 'done':
        break
    try:
        nint = float(n)
        t.append(nint)
    except:
        print "Invalid input"
if len(t) > 0 :
    print "Maximum: ", max(t)
    print "Minimum: ", min(t)

on June 7, 2011, 8:07 a.m.

Nathan Day said:

What is the significance of the "&gt"? I see you use it a couple of times but I don't understand what its doing. Cheers

on June 7, 2011, 8:43 a.m. in reply to Vladimir Támara Patiño

Vladimir Támara Patiño said:

I see > in your question, however in what I wrote I don't use that.

If you wanted > (and your symbol became escaped), it means greather than.

Try from the python interactive interpreter:

3>2

And it will answer True, but with:

2>3

it will answer False.

It is in chapter 3 of the book of this course.

Did I answer your question?

on June 7, 2011, 9:14 a.m. in reply to Nathan Day

Nathan Day said:

Yeah great answer, sorry I haven't responded faster but I get it now. Thanks!

on June 9, 2011, 9:26 a.m. in reply to Vladimir Támara Patiño

Vladimir Támara Patiño said:

In page 90 of chapter 6, there is a link to docs.python.org/library/string.html but it should be to http://docs.python.org/library/stdtypes.html#string-methods

print "Ex. 6.1"
s = raw_input('String? ')
i = len(s) - 1
while (i >= 0):
    print s[i];
    i = i - 1;

print "Ex. 6.2"
fruit = 'watermelon'
print fruit, ' is equal to ', fruit[:]

print "Ex. 6.3"
def count(word, letter):
    assert len(letter) == 1
    count = 0
    for l in word:
        if l == letter:
            count = count + 1
    return count

print count('banana', 'a')


print "Ex. 6.4"
# The references to the documentation of string methods, should point to 
# http://docs.python.org/library/stdtypes.html#string-methods

print 'banana'.count('a')


print "Ex. 6.5"
str = 'X-DSPAM-Confidence: 0.8475'
ps = str.find(':');
f = str[ps+1:];
print float(f.strip())


print "Ex. 6.6"
assert "lower".capitalize() == "Lower"
assert "hi".center(10) == "    hi    "
assert "ababcabcd".count("ab") == 3
assert "example".endswith("le")
assert "example".find("mp") == 3
assert "{0} examples".format(10)
assert "example".index("mp") == 3
assert "2abc".isalnum()
assert "2abc".isalpha() == False
assert "2abc".isdigit() == False
assert "lower".islower()
assert " ".isspace()
assert "Camel One ".istitle()
assert "EXCAMPLE".isupper()
assert " - ".join(['a','b']) == "a - b"
assert "LOWER".lower() == "lower"
assert "xxxxax".lstrip('x') == 'ax'
assert 'yyyyay'.replace('y', 'z') == 'zzzzaz'

on June 4, 2011, 11:40 p.m.

Tyler Cipriani said:

good catch on that link - thanks!

on June 5, 2011, 3:44 a.m. in reply to Vladimir Támara Patiño

Nathan Day said:

Extra Credit!!!

Here are my responses...http://pastie.org/2004458.

I like using Geany as my text editor and I started learning on Ubuntu but am now on a Windows machine. What is the first line of code needed in the Windows environment so that I can execute my scripts? In Ubuntu is was something like this #!/usr/bin/env python.

on June 1, 2011, 1:40 p.m.

Coffe Bean said:

Hi Nathan,

you'll have to type python some_script.py in a dos console window to run the script. Alas, windows doesn't allow you to use the very handy #!/usr/bin/python construction.

The python executable needs to be in your path (for convenience), though I suppose you could always write a little bat-file. There is more than one way to do it, as the say on planet Perl.

Alternatively, if you use, say notepad++ or netbeans or some of the other IDEs you can run the script from within the IDE/editor. I find that quite handy.

Happy Computing,

Stefan

on June 1, 2011, 5:50 p.m. in reply to Nathan Day

Nathan Day said:

Hey Stefan,

Thanks for explaining that stuff about the non-existant windows path. I am not very familar with wrting bat files, I only have limited bash experience from ubuntu, is there a good resource out there where I could see how these are constructed? Also thanks for the suggestions about alternative editors I just installed notepad++ and am looking forward to test driving it this week! Thanks for your help I appreciate it.

Cheers,

Nate

on June 2, 2011, 9:04 a.m. in reply to Coffe Bean
Anonym said:

here is my script:

word = 'Tardis'

index = -1

while index < len(word):

letter = word[index]

index = index - 1

print letter

I think this works but PyDev still gives me an error

s

i

d

r

a

T

Traceback (most recent call last):

File "/home/nexus/Python-Programming-101/6.1.py", line 4, in <module>

letter = word[index]

IndexError: string index out of range

on May 31, 2011, 1:54 p.m.

Tyler Cipriani said:

You've got a couple issues here.

Problem 1 - you've created an infinite loop. Starting at -1 and subtracting 1 on each iteration through the loop will always result in a number that less than the length of the word 'Tardis'.

Problem 2 - In your loop you're eventually going to come up with word[-7] which doesn't exist. This creates the IndexError that you're seeing.

To fix these problems you'll have to find an iterator with an ending. One thing to note is that a string can be iterated over like a list. So 'for letter in word' will work as an iterator. Check out the gist here: https://gist.github.com/1001133 for my solution to your problem. As an alternative to using my 'i' variable is to use the extended slice method to reverse the string: http://docs.python.org/release/2.3.5/whatsnew/section-slices.html

Let me know if any of this doens't make sense.

on May 31, 2011, 3:51 p.m. in reply to Anonym

Anonym said:

The " 'for letter in word' will work as an iterator" does not make sense to me at all. Can you please elaborate?

on May 31, 2011, 7:49 p.m. in reply to Tyler Cipriani
Vladimir Támara Patiño said:
Since index starts with a negative value, and you substract 1 in each iteration it will be always negative, so the condition index < len(word) will be always true and your loop would not end, however when index becomes -7 it generates the error (as noted by Tyler ).

Probably you will obtain the result you want by changig the condition to:
-index <= len(word)
i.e first you converti index to a positive value, that becomes bigger in each iteration so it finally passes len(word)
on June 1, 2011, 6:47 a.m. in reply to Anonym
Tyler Cipriani said:

Sure.

So with a list if you wanted to iterate through each item in a list and append the word "stuff" to it you'd just say:

list = ['fun', 'new', 'old', 'big', 'little']
for item in list:
item,'stuff' # fun stuff \n new stuff \n old stuff etc.

and for each item in the list it would append the word 'stuff'. Strings can work the same way. So if you wanted to append the word stuff onto each letter of a string you'd type:

word = 'string'
for letter in word:
letter,'stuff' # s stuff \n t stuff \n r stuff \n i stuff etc.

That's why in my gist example, if you can reverse the string before passing it to 'for letter in word' you won't need a sepepearte iteration variable. e.g.:

word = 'string'
word = [::-1] # reverses 'string'
for letter in word:
print letter

on June 1, 2011, 8:43 a.m. in reply to Anonym

Anonym said:

Thanks Tyler that explains alot of things to me.

on June 1, 2011, 9 a.m. in reply to Tyler Cipriani

Tyler Cipriani said:

@Vladimir Támara Patiño Just re-reading this thread - nice solution to this by the way - much more succinct that the corrections I offered.

on June 5, 2011, 3:50 a.m. in reply to Vladimir Támara Patiño