0

I am working on a small exercise.

There is a text file which has 3 columns: EmployeeID, First Name and Last Name. Write a program to create dictionary whose keys() are the EmployeeIDs in the text file and the values() are the first and last names combined.

I tried first without loop.

f = open('empID.txt','r')
line1 = f.readline().split()
line2 = f.readline().split()
line3 = f.readline().split()
print line1
print line2
print line3
empdict={}
empdict[line1[0]] = line1[1]+" "+line1[2]
empdict[line2[0]] = line2[1]+" "+line2[2]
empdict[line3[0]] = line3[1]+" "+line3[2]
print "The resulting dictionary is \n",empdict
f.close()

This worked fine. Then I tried loop.

f = open('empID.txt','r')
empdict = {}
for line in f:
        line = f.readline().split()
        print line
        empdict[line[0]] = line[1]+" "+line[2]

print "The resulting dictionary is \n",empdict
f.close()

This threw an error:

Traceback (most recent call last):
  File "empID3.py", line 4, in <module>
    line = f.readline().split()
ValueError: Mixing iteration and read methods would lose data

Somebody advised on a similar situation to use while loop instead, so I tried:

In place of for line in f:, I added while True: and this time it printed all line outputs and then threw another error instead of outputting the dictionary.

Traceback (most recent call last):
  File "empID3.py", line 6, in <module>
    empdict[line[0]]=line[1]+" "+line[2]
IndexError: list index out of range

Can somebody help me get this right? Where am I wrong?

Community
  • 1
  • 1
Anand Surampudi
  • 131
  • 2
  • 9

2 Answers2

3

You don't need to use file.readline() at all. Looping over the file (iteration) already gives you the lines, so drop the now-redundant call to file.readline() altogether:

for line in f:
    print line
    line = line.split()
    empdict[line[0]] = line[1]+" "+line[2]

You can use a while loop too, but then you need to check for an end-condition; your file is not endless. file.readline() returns an empty string when you have read the whole file; splitting an empty string results in an empty list, causing your exception.

Checking for the file-is-done case:

while True:
    line = f.readline()
    if not line:
        break
    line = line.split()
    print line
    empdict[line[0]] = line[1]+" "+line[2]

This is a lot more verbose than the straight for loop case.

You can limit the splitting, to just the first space:

empid, name = line.strip().split(None, 1)
empdict[empid] = name

The str.split() call splits on whitespace when using no arguments or None, and a second argument limits the split count. This is especially helpful if the last name contains whitespace:

>>> line = '42 Guido van Rossum\n'
>>> line.split()
['42', 'Guido', 'van', 'Rossum']
>>> line.split(None, 1)
['42', 'Guido van Rossum\n']

You do have to explicitly strip the newline though.

You can also use str.partition(), which is slightly faster than str.split() for the just-one-split case where you can split on a fixed string (rather than arbitrary whitespace):

empid, _, name = line.strip().partition(' ')

The _ variable is just there to catch the partition; the method returns 3 values. The _ name is a convention, in this context it means I'll just ignore that value.

It could be that some of the lines in your file are empty; you'd want to skip those lines, because otherwise you'll still get IndexError or ValueError exceptions:

line = line.strip()
if not line:
    continue  # line is empty

And last but not least, if you use the file object as a context manager, in a with statement, Python will close the file object for you when done reading, wether your code was successful or had an exception:

empdict = {}
with open('empID.txt','r') as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        empid, _, name = line.partition(' ')
        empdict[empid] = name

No need to call f.close() at all now.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks for the explanation. I have deleted my answer. I will be more careful next time – Ganesh Kamath - 'Code Frenzy' Feb 17 '15 at 08:10
  • @MartijnPieters Thanks a lot for your time on this. I tried all the changes you suggested and particularly, the two ideas: list of `[empid, name]` and `split(None, 1)` are really cool and interesting. But unfortunately, all of them gave me either `IndexError` or `ValueError: need more than 0 values to unpack`. I am sorry none of them worked. I am sure I am doing something wrong. I will continue working on it and let you know if got it. Thanks again. – Anand Surampudi Feb 17 '15 at 08:33
  • @Anand: you are still splitting an empty line. You can test for that explicitly with `if line.strip():`. – Martijn Pieters Feb 17 '15 at 09:31
1

The solution could be very simple when you go through the for loop

output_dict = {}
for line in f:
    line = line.strip()
    key, value = line.split(" ", 1)
    output_dict[key] = value

Or, a one liner for this will be:

output_dict = dict(map(lambda i: i.strip().split(" ", 1), f.read().split("\n")))

PS: I would prefer the first over the second as in my opinion, it is lot more cleaner.

thiruvenkadam
  • 4,170
  • 4
  • 27
  • 26