This course will become read-only in the near future. Tell us at community.p2pu.org if that is a problem.

Data Types



Python recognizes and differentiates between several kinds of data. Lets look at the following and some of the things we can do with them:

 

Extra Credit

You are designing an inventory and point-of-sale system for a local produce market. Aspects of the system include:

  1. farmer name and booth number
  2. market slogan
  3. produce name and price
  4. names of growing seasons ('spring', 'sumer', 'fall', 'winter')
  5. cities where the farmers and participants are located (no state data necessary yet)

Q: How can each aspect be represented in Python data types? Give an example for each.

 

You can also have a quick look at: http://pytuts.blogspot.com/2011/10/hi.html

Task Discussion


  • oskoff said:

    Hi all,

    I'm stuck on exercises 8.2. I'm guessing to properly guard it is to ensure that words[2] is one of the days of the week. Is this correct and if so how do you correctly guard it? I have tried the following, yet results in only blank lines:

     

    #8.2.py
    fhand = open('mbox.txt')
    count = 0
    week = ["Sat", "Sun", "Mon", "Tue", "Wed", "Fri"]
    for line in fhand:
        words = line.split()
        if len(words) == 0 : continue
        if words[0] != 'From' : continue
        if words[2] != week[:] : continue
        print words[2]
     
     
    Thanks for your help.
    on Sept. 9, 2011, 7:13 a.m.

    Mars83 said:

    I study the same topic now. I think it first has to be guarded to handle lines only containing two words.

    From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
    [...]
    From xyz
    [...]
    


    So you have more than zero words and it will pass the guardian. But when printing 'words[2]' (= two-eth / the 3rd word) there will be an IndexError.

     

    So I edited the line

    if len(words) == 0 : continue
    

    to

    if len(words) <= 2 : continue

    Now lines with less than three words are uninteresting. You also could fetch errors by surrounding the print statement with:

    try:                # Additional guard
        print(words[2])
    except IndexError:
        continue


    To check, if the word matches your list you could use following code:

    if words[2] not in week[:]: continue

    on Oct. 7, 2011, 5:53 p.m. in reply to oskoff

    oskoff said:

    Thanks Mars83, adding your changes worked perfectly!

    on Oct. 13, 2011, 11:22 a.m. in reply to Mars83
  • Sudaraka Wijesinghe said:

    My py4int exercise 10 code

    Exercise 10.1

    # Open file get the file name and open it
    fname = raw_input('Enter a file name: ')
    try:
        fh=open(fname)
    except:
        print 'Unble to open', fname
        exit()
    
    email_count = {}
    
    # Read each line and find out message count for each email address
    for line in fh :
        if not line.startswith('From ') : continue
        
        line_words = line.split()
        email_count[line_words[1]] = email_count.get(line_words[1], 0) + 1
    
    # Find the email address with most messages
    sorted_email_count = []
    for email, count in email_count.items() :
        sorted_email_count = sorted_email_count + [(count, email)]
    
    #sorted_email_count
    sorted_email_count.sort(reverse=True)
    
    # Display the result
    print sorted_email_count[0][1], sorted_email_count[0][0]
    
    

     

    Exercise 10.2

    # Open file get the file name and open it
    fname = raw_input('Enter a file name: ')
    try:
        fh=open(fname)
    except:
        print 'Unble to open', fname
        exit()
    
    hour_count = {}
    
    # Read each line and find out message count for each email address
    for line in fh :
        if not line.startswith('From ') : continue
        
        line_words = line.split()
        time=line_words[5].split(':')
        hour_count[time[0]] = hour_count.get(time[0], 0) + 1
    
    # Find the hour with most messages
    sorted_hour_count = hour_count.items()
    
    #sorted_email_count
    sorted_hour_count.sort()
    
    # Display the result
    for hour, count in sorted_hour_count :
        print hour, count
    

     

    Exercise 10.3

    # Open file get the file name and open it
    fname = raw_input('Enter a file name: ')
    if 1 > len(fname) :
        fname = 'clown.txt'
    
    try:
        fh=open(fname)
    except:
        print 'Unble to open', fname
        exit()
    
    letter_count = {}
    
    # Read the entire file and split into letters
    #letter_list = list(fh.read())
    # Or, if need to filter out spaces and special characters
    text = fh.read()
    words = text.split()
    letter_list = []
    for word in words :
        letter_list = letter_list + list(word)
    
    # Count frequency
    for letter in letter_list :
        letter_count[letter] = letter_count.get(letter, 0) + 1
    
    sorted_letter_count = []
    for letter, count in letter_count.items() :
        sorted_letter_count = sorted_letter_count + [(count, letter)]
    
    sorted_letter_count.sort(reverse=True)
    
    for count, letter in sorted_letter_count :
        print letter,
    
    
    on July 18, 2011, 3:16 a.m.
  • Sudaraka Wijesinghe said:

    My py4int exercise 9 code

    Exercise 9.3

    # Open file get the file name and open it
    fname = raw_input('Enter a file name: ')
    try:
        fh=open(fname)
    except:
        print 'Unble to open', fname
        exit()
    
    day_count = {}
    
    # Read each line
    for line in fh :
        if not line.startswith('From ') : continue
        
        line_words = line.split()
        day_count[line_words[2]] = day_count.get(line_words[2], 0) + 1
    
        
    # Display the day count
    print day_count
    
    

     

    Exercise 9.4

    # Open file get the file name and open it
    fname = raw_input('Enter a file name: ')
    try:
        fh=open(fname)
    except:
        print 'Unble to open', fname
        exit()
    
    email_count = {}
    
    # Read each line and find out message count for each email address
    for line in fh :
        if not line.startswith('From ') : continue
        
        line_words = line.split()
        email_count[line_words[1]] = email_count.get(line_words[1], 0) + 1
    
    # Find the email address with most messages
    max_email = None
    max_count = None
    for email, count in email_count.items() :
        if max_count is None or max_count < count :
            max_email = email
            max_count = count
            
    # Display the result
    print max_email, max_count
    
    

     

    Exercise 9.5

    # Open file get the file name and open it
    fname = raw_input('Enter a file name: ')
    try:
        fh=open(fname)
    except:
        print 'Unble to open', fname
        exit()
    
    domain_count = {}
    
    # Read each line
    for line in fh :
        if not line.startswith('From ') : continue
        
        line_words = line.split()
        email = line_words[1].split('@')
        domain_count[email[1]] = domain_count.get(email[1], 0) + 1
    
        
    # Display the domain count
    print domain_count
    
    

    on July 18, 2011, 3:13 a.m.
  • Sudaraka Wijesinghe said:

    My py4int exercise 8 code

    Exercise 8.4

    # Open file get the file name and open it
    fname = raw_input('Enter a file name: ')
    try:
        fh=open(fname)
    except:
        print 'Unble to open', fname
        exit()
    
    # Store unique words from the file
    collected_words = []
    
    # Read each line
    for line in fh :
        line_words = line.split()
    
        # Read each word
        for word in line_words :
            # Check if in the collected list, if not add it
            if not word in collected_words :
                collected_words.append(word)
    
    # Sort the collected word list
    collected_words.sort()
    
    # Display the sorted words in our collection
    print collected_words
    
    

     

    Exercise 8.5

    # Open file get the file name and open it
    fname = raw_input('Enter a file name: ')
    try:
        fh=open(fname)
    except:
        print 'Unble to open', fname
        exit()
    
    line_count = 0
    
    # Read each line
    for line in fh :
        if not line.startswith('From ') : continue
        
        line_words = line.split()
        line_count = line_count + 1
    
        # Display the from address
        print line_words[1]
        
    # Display the line count
    print 'There were', line_count, 'lines in the file with From as the first word'
    
    

     

    Exercise 8.6

    # Initialize variables
    collected_numbers = []
    
    while 1 :
        inp = raw_input ('Enter a number: ')
        if 'done' == inp :
            break
    
        try:
            collected_numbers = collected_numbers + [float(inp)]
        except:
            print 'Invalid input'
    
    print 'Maximum:', max(collected_numbers)
    print 'Minimum:', min(collected_numbers)
    
    

    on July 18, 2011, 3:09 a.m.
  • Sudaraka Wijesinghe said:

    My py4int exercise 6 code

    Exercise 6.5

    str = 'X-DSPAM-Confidence: 0.8475'
    
    number_start = str.find(':')
    if -1 < number_start :
        number_str = str[number_start+1:].strip()
        
        try:
            number_float = float(number_str)
        except:
            print 'Failed to convert the found string to float'
            quit()
    
        print 'Number found:', number_float
        
    else:
        print 'Unable to detect number starting position'
    
    
    
    
    

    on July 18, 2011, 2:58 a.m.
  • Vladimir Támara Patiño said:

    10.1

    fhand = open('mbox-short.txt')
    d = {}
    for line in fhand:
        words = line.split()
        if len(words) >= 2 and words[0] == 'From': 
            d[words[1]] = d.get(words[1], 0) + 1
    fhand.close()
    lt = []
    for p in d:
        lt.append((d[p], p))
    lt.sort(reverse = True)
    if len(lt) > 0:
        (c, p) = lt[0]
        print "Person with the most commits: ", p, " with ", c, " commits"
    

    10.2

    fhand = open('mbox-short.txt')
    d = {}
    for line in fhand:
        words = line.split()
        if len(words) >= 6 and words[0] == 'From': 
            h = words[5].split(':')
            d[h[0]] = d.get(h[0], 0) + 1
    fhand.close()
    lt = d.items()
    lt.sort()
    for (h, f) in lt:
        print h+" "+str(f)
    

    10.3

    def most_frequent(s):
        f = {}
        for i in s:
            f[i] = f.get(i, 0) + 1
        lt = []
        for i in f:
            lt.append((f[i], i))
        lt.sort(reverse = True)
        for (r, i) in lt:
            print i+" "+str(r)
    
    a = raw_input('File to use? ')
    fhand = open(a)
    s = ''
    for line in fhand:
        s = s + " " + line.lower()
    fhand.close()
    most_frequent(s)
    

    I ran this one with part of http://www.gutenberg.org/cache/epub/12501/pg12501.txt (after recoding it to latin1). The top ten letters are:

    • e 6741
    • a 5394
    • o 4986
    • s 4452
    • n 3630
    • r 3103
    • l 2927
    • d 2652
    • i 2551
    • u 2321
    According to http://en.wikipedia.org/wiki/Letter_frequencies the most frequent letters in spanish are "eaosr nidlc". So we obtained a very similar result.
    on June 9, 2011, 11:47 p.m.
  • Vladimir Támara Patiño said:

    Link to dict "official" documentation http://docs.python.org/library/stdtypes.html#mapping-types-dict

    9.1

    def file2dict():
        d = dict()
        fhand = open('words.txt')
        for line in fhand:
            words = line.split()
            for word in words:
                d[word] = word
        return d
    
    d = file2dict()
    if 'live' in d:
        print "live appears"
    else:
        print "live doesn't appears"
    

    9.2

    def histogram(s):
        d = dict()
        for c in s:
            d[c] = d.get(c, 0) + 1
        return d
    
    print histogram('brontosaurus')
    

    9.3

    fhand = open('mbox-short.txt')
    count = 0
    d = {}
    for line in fhand:
        words = line.split()
        if len(words) >= 3 and words[0] == 'From':
            d[words[2]] = d.get(words[2], 0) + 1
    
    print d
    

    9.4

    fn = raw_input('Enter a file name: ')
    fhand = open(fn)
    count = 0
    d = {}
    for line in fhand:
        words = line.split()
        if len(words) >= 2 and words[0] == 'From':
            d[words[1]] = d.get(words[1], 0) + 1
    
    indmax = None
    for v in d:
        if indmax == None or d[v] > d[indmax]:
            indmax = v
    
    if indmax != None:
        print indmax, d[indmax]
    

    9.5

    fn = raw_input('Enter a file name: ')
    fhand = open(fn)
    count = 0
    d = {}
    for line in fhand:
        words = line.split()
        if len(words) >= 2 and words[0] == 'From' and '@' in words[1]:
            dom = words[1][words[1].find('@')+1:]
            d[dom] = d.get(dom, 0) + 1
    print d
    
    on June 8, 2011, 7:08 a.m.
  • Vladimir Támara Patiño said:

    I didn't find problems in chapter 8.  Regarding methods and standard functions over lists we can check: http://docs.python.org/tutorial/datastructures.html#more-on-lists

    print "8.1"
    def chop(t):
        if len(t) >= 1:
            del t[0];
        if len(t) >= 1:
            t.pop();
        return None
    
    def middle(t):
        n = list(t)
        chop(n)
        return n
    
    print "8.2"
    print "The initial program fails for example if mbox-short.txt is the line:"
    print "From me"
    fhand = open('mbox-short.txt')
    count = 0
    for line in fhand:
        words = line.split()
        print 'Debug:', words
        if len(words) < 3 : continue
        if words[0] != 'From' : continue
        print words[2]
    
    print "8.3"
    fhand = open('mbox-short.txt')
    count = 0
    for line in fhand:
        words = line.split()
        print 'Debug:', words
        if len(words) >= 3 and words[0] == 'From':
            print words[2]
    
    print "8.4"
    fhand = open('romeo.txt')
    r = []
    for line in fhand:
        words = line.split()
        for w in words:
            if not w in r:
                r.append(w)
    r.sort()
    print r
    
    print "8.5"
    fname = raw_input("Enter a file name: ");
    fhand = open(fname)
    countfromp = 0
    for line in fhand:
        words = line.split()
        if len(words) > 0 and words[0] == 'From':
            countfromp = countfromp + 1
            if (len(words) >= 2):
                print words[1]
    print "There were ", countfromp, "lines in the file with From as the first word"
    
    
    print "8.6"
    t = []
    while True:
        n = raw_input("Enter a number: ")
        if n == 'done':
            break
        try:
            nint = float(n)
            t.append(nint)
        except:
            print "Invalid input"
    if len(t) > 0 :
        print "Maximum: ", max(t)
        print "Minimum: ", min(t)
    
    
    

    on June 7, 2011, 8:07 a.m.

    Nathan Day said:

    What is the significance of the "&gt"? I see you use it a couple of times but I don't understand what its doing. Cheers

    on June 7, 2011, 8:43 a.m. in reply to Vladimir Támara Patiño

    Vladimir Támara Patiño said:

    I see &gt; in your question, however in what I wrote I don't use that.

    If you wanted  > (and your symbol became escaped), it means greather than.

    Try from the python interactive interpreter:

    3>2

    And it will answer True, but with:

    2>3

    it will answer False.

    It is in chapter 3 of the book of this course.

    Did I answer your question?

    on June 7, 2011, 9:14 a.m. in reply to Nathan Day

    Nathan Day said:

    Yeah great answer, sorry I haven't responded faster but I get it now. Thanks!

    on June 9, 2011, 9:26 a.m. in reply to Vladimir Támara Patiño
  • Vladimir Támara Patiño said:

    In page 90 of chapter 6, there is a link to docs.python.org/library/string.html but it should be to http://docs.python.org/library/stdtypes.html#string-methods
    print "Ex. 6.1"
    s = raw_input('String? ')
    i = len(s) - 1
    while (i >= 0):
        print s[i];
        i = i - 1;
    
    print "Ex. 6.2"
    fruit = 'watermelon'
    print fruit, ' is equal to ', fruit[:]
    
    print "Ex. 6.3"
    def count(word, letter):
        assert len(letter) == 1
        count = 0
        for l in word:
            if l == letter:
                count = count + 1
        return count
    
    print count('banana', 'a')
    
    
    print "Ex. 6.4"
    # The references to the documentation of string methods, should point to 
    # http://docs.python.org/library/stdtypes.html#string-methods
    
    print 'banana'.count('a')
    
    
    print "Ex. 6.5"
    str = 'X-DSPAM-Confidence: 0.8475'
    ps = str.find(':');
    f = str[ps+1:];
    print float(f.strip())
    
    
    print "Ex. 6.6"
    assert "lower".capitalize() == "Lower"
    assert "hi".center(10) == "    hi    "
    assert "ababcabcd".count("ab") == 3
    assert "example".endswith("le")
    assert "example".find("mp") == 3
    assert "{0} examples".format(10)
    assert "example".index("mp") == 3
    assert "2abc".isalnum()
    assert "2abc".isalpha() == False
    assert "2abc".isdigit() == False
    assert "lower".islower()
    assert " ".isspace()
    assert "Camel One ".istitle()
    assert "EXCAMPLE".isupper()
    assert " - ".join(['a','b']) == "a - b"
    assert "LOWER".lower() == "lower"
    assert "xxxxax".lstrip('x') == 'ax'
    assert 'yyyyay'.replace('y', 'z') == 'zzzzaz'
    
    on June 4, 2011, 11:40 p.m.

    Tyler Cipriani said:

    good catch on that link - thanks!

    on June 5, 2011, 3:44 a.m. in reply to Vladimir Támara Patiño
  • Nathan Day said:

    Extra Credit!!!

    Here are my responses...http://pastie.org/2004458.

    I like using Geany as my text editor and I started learning on Ubuntu but am now on a Windows machine. What is the first line of code needed in the Windows environment so that I can execute my scripts? In Ubuntu is was something like this #!/usr/bin/env python.

    on June 1, 2011, 1:40 p.m.

    Coffe Bean said:

    Hi Nathan,

    you'll have to type python some_script.py in a dos console window to run the script. Alas, windows doesn't allow you to use the very handy #!/usr/bin/python construction.

    The python executable needs to be in your path (for convenience), though I suppose you could always write a little bat-file. There is more than one way to do it, as the say on planet Perl. smiley

    Alternatively, if you use, say notepad++ or netbeans or some of the other IDEs you can run the script from within the IDE/editor. I find that quite handy.

    Happy Computing,

    Stefan

    on June 1, 2011, 5:50 p.m. in reply to Nathan Day

    Nathan Day said:

    Hey Stefan,

    Thanks for explaining that stuff about the non-existant windows path. I am not very familar with wrting bat files, I only have limited bash experience from ubuntu, is there a good resource out there where I could see how these are constructed? Also thanks for the suggestions about alternative editors I just installed notepad++ and am looking forward to test driving it this week! Thanks for your help I appreciate it.

    Cheers,

    Nate

    on June 2, 2011, 9:04 a.m. in reply to Coffe Bean
  • Anonym said:

    here is my script:

     

    word = 'Tardis'
    index = -1
    while index < len(word):
        letter = word[index]
        index = index - 1
        print letter
     I think this works but PyDev still gives me an error
    s
    i
    d
    r
    a
    T
    Traceback (most recent call last):
      File "/home/nexus/Python-Programming-101/6.1.py", line 4, in <module>
        letter = word[index]
    IndexError: string index out of range
     
    on May 31, 2011, 1:54 p.m.

    Tyler Cipriani said:

    You've got a couple issues here. 

    Problem 1 - you've created an infinite loop. Starting at -1 and subtracting 1 on each iteration through the loop will always result in a number that less than the length of the word 'Tardis'.

    Problem 2 - In your loop you're eventually going to come up with word[-7] which doesn't exist. This creates the IndexError that you're seeing.

    To fix these problems you'll have to find an iterator with an ending. One thing to note is that a string can be iterated over like a list. So 'for letter in word' will work as an iterator. Check out the gist here: https://gist.github.com/1001133 for my solution to your problem. As an alternative to using my 'i' variable is to use the extended slice method to reverse the string: http://docs.python.org/release/2.3.5/whatsnew/section-slices.html

    Let me know if any of this doens't make sense.

    on May 31, 2011, 3:51 p.m. in reply to Anonym

    Anonym said:

    The " 'for letter in word' will work as an iterator" does not make sense to me at all. Can you please elaborate?

    on May 31, 2011, 7:49 p.m. in reply to Tyler Cipriani

    Vladimir Támara Patiño said:

    Since index starts with a negative value, and you substract 1 in each iteration it will be always negative, so the condition index < len(word) will be always true and your loop would not end, however when index becomes -7 it generates the error (as noted by Tyler ).

    Probably you will obtain the result you want by changig the condition to:

    -index <= len(word)
    i.e first you converti index to a positive value, that becomes bigger in each iteration so it finally passes len(word)

    on June 1, 2011, 6:47 a.m. in reply to Anonym

    Tyler Cipriani said:

    Sure. 

    So with a list if you wanted to iterate through each item in a list and append the word "stuff" to it you'd just say:

    list = ['fun', 'new', 'old', 'big', 'little']
    for item in list:
      item,'stuff'   #  fun stuff \n new stuff \n old stuff etc.

    and for each item in the list it would append the word 'stuff'. Strings can work the same way. So if you wanted to append the word stuff onto each letter of a string you'd type:

    word = 'string'
    for letter in word:
      letter,'stuff'  #  s stuff \n t stuff \n r stuff \n i stuff etc.

    That's why in my gist example, if you can reverse the string before passing it to 'for letter in word' you won't need a sepepearte iteration variable. e.g.:

    word = 'string'
    word = [::-1]  #  reverses 'string'
    for letter in word:
      print letter

    on June 1, 2011, 8:43 a.m. in reply to Anonym

    Anonym said:

    Thanks Tyler that explains alot of things to me.

    on June 1, 2011, 9 a.m. in reply to Tyler Cipriani

    Tyler Cipriani said:

    @Vladimir Támara Patiño Just re-reading this thread - nice solution to this by the way - much more succinct that the corrections I offered. 

    on June 5, 2011, 3:50 a.m. in reply to Vladimir Támara Patiño