Why keep on converting from string to list back and forth?

Question

I have a text file containing all the students' names and other information of my programing course like this:

Smith, John sj0012@uni.edu smjo0012@student.edu Student  
Lester, Moe mole0025@uni.edu    mole0025@student.edu    Student
Christ, Jesus jech0020@uni.edu    jech@student.edu  Student

...

Some of them contains tabs and other unnecessary spaces in between the text within each line. So that the first email address and the second are tabbed. Sometimes between both these and between 'Student'. But my intention is just making a new text file containing just the Name, Lastname in a nice column. I did manage to get my result but only by keep converting the text to list and back to string again. Is there a better way of doing this? Python 2.7

peps = open('ppl.txt', 'r')

for line in peps.readlines():
    line = line.strip()                   # Delete space
    line = line.split('\t')               # Split at tab indentation and make a list
    line = map(lambda s: s.strip(), line) # Remove tab indentation
    del line [1:]                         # Delete everything after Name.
    line = ','.join(line)                 # Make Lastname, Name a string at ','
    line = line.split(',')                # Make Lastname, Name a list at ','
    line[0], line[-1] = line[-1], line[0] # Exchange position of Lastname, Name
    line = ', '.join(line)                # Convert to string again and join at ','
    print line

if you split at tab, then you shouldn't have to remove tabs later, they should be already gone. — TehTris, Oct 14 '14 at 23:12
To do "no spaces in between the lines", just don't type blank lines in between them. Also, you should probably either quote them as code or as a quote. I've edited it for you, but please look and see if I've done it right. — abarnert, Oct 14 '14 at 23:23
I'm confused. Can you give us literally three lines from the file? How are we supposed to know where the `firstname` field ends and the `email` field begins? Is there a comma there? — Adam Smith, Oct 14 '14 at 23:29
@AdamSmith: In the original paste (and I hope I preserved it), there's a tab character after the end of the name, and likewise between all of the other columns, so this appears to be a TSV file. (I wish SO made it easier to distinguish tabs and spaces in code blocks…) — abarnert, Oct 14 '14 at 23:31
@abarnert ah ha! I think I missed that in the original paste and thought it was all comma-separated. That invalidates my answer so I deleted. Using the `csv` module is the perfect solution in this case so I defer to your answer :) — Adam Smith, Oct 15 '14 at 00:47
@AdamSmith: The original version _was_ comma-separated; the edited version is tab-separated (except that the first column itself has two values separated by a comma). So, you didn't miss anything. — abarnert, Oct 15 '14 at 00:53
One last note: the `readlines()` is useless here. In fact, it's nearly always useless. `peps` is already an iterable of lines; `peps.readlines()` is an iterable of the same lines, but it has to read the entire file, split it, and store it as a big list in memory before you can start looping over it. — abarnert, Oct 15 '14 at 01:23

abarnert · Accepted Answer · 2014-10-15T17:28:37.170

If you're trying to deal with a file where each line is a comma-separated list of values, that's exactly what the csv module is for.

In your updated version, it looks like they're actually a tab-separated list of values… but that's just a dialect of CSV (known as TSV), which the module can also handle just fine:

peps = open('ppl.txt', 'r')
reader = csv.reader(peps, delimiter='\t')
for row in reader:
    # here, row is a list of column values

You can also use csv.writer to write the rows back out in CSV format. You can even use csv.writer(sys.stdout) if you want to write those rows to the terminal. You never have to deal with splitting and joining; that's all taken care of for you.

However, the first column is itself a lastname, first, which you also need to parse. For that, I'd use either str.split or str.partition (depending on exactly what behavior you want to get if, say, Cher is in your class). I'm also not sure whether you want to split on ', ', or split on , and then strip out spaces. Either way is easy. For example:

lastname, _, firstname = row[0].partition(',')
writer.writerow((firstname.strip(), lastname.strip()))

While we're at it, it's always better to use with statements with files, so let's do that too.

But my intention is just making a new text file containing just the Name, Lastname in a nice column.

import csv
with open('ppl.txt') as infile, open('names.txt', 'w') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    writer = csv.writer(outfile)
    for row in reader:
        lastname, _, firstname = row[0].partition(',')
        writer.writerow((firstname.strip(), lastname.strip()))

I'm not entirely sure what your issue is with spaces. If there are spaces after the tabs in some cases and you want to ignore them, you should look at the skipinitialspaces option in the csv module. For example:

reader = csv.reader(infile, skipinitialspaces=True)

But if there are tabs and spaces in the middle of the actual columns, and you want to strip those out, you probably want to use str.replace or a regular expression for that. For example:

lastname, _, firstname = row[0].partition(',')
firstname = re.sub(r'\s', '', firstname)
lastname = re.sub(r'\s', '', lastname)
writer.writerow((firstname, lastname))

The csv module seems to do trick, by initially making everything a clear list. Much easier to work from there! Thanks. — Senethys, Oct 14 '14 at 23:55
I see this a lot but am never sure: why use `lastname, _, firstname = row[0].partition(',')` instead of `lastname, firstname = row[0].split(',')`? — Adam Smith, Oct 15 '14 at 00:48
@AdamSmith: As I said, "depending on exactly what behavior you want to get if, say, Cher is in your class". If you do `last, first = 'Cher'.split(',')`, you get a `ValueError` from trying to unpack a list of one value into two variables; `last, _, first = 'Cher'.partition(',')` gives you `'Cher'`, `''`, and `''`. Sometimes one is more appropriate, sometimes the other. When you can be sure it doesn't actually matter, you can try to figure out which one seems "conceptually nicer", or just pick at random. :) — abarnert, Oct 15 '14 at 00:55
@AdamSmith: (There's also a problem for people with two or more commas, but in that case, you can just use `split(',', 1)`, so it's not really an issue.) — abarnert, Oct 15 '14 at 00:55
`writer.writerow(firstname.strip(), lastname.strip())` gives an error. It asks for one argument. — Senethys, Oct 15 '14 at 01:36
@Senethys you need to wrap it in another set of parentheses. Right now you're passing `writer.writerow` two arguments. One is `firstname.strip()` and the other is `lastname.strip()`. Instead you need to pass it ONE argument: the tuple `(firstname.strip(), lastname.strip())` — Adam Smith, Oct 15 '14 at 03:33
@AdamSmith: Thanks for the catch; I made that same typo in 2 out of 4 places. Fixed now. — abarnert, Oct 15 '14 at 17:28

score 1 · Answer 2 · edited May 23 '17 at 12:11

1

You could use a regex ('(\w+),\W+(\w+)') to get Lastname, Name out of each line.

Something like this:

import re
re.match('(\w+(?:-\w+)*),\W+(\w+(?:-\w+)*)', 'Lastname, Name, uniname@uni.edu, uniname@student.edu, Student/Teacher').groups()

Took help (for the hyphenated regex) from here.

edited May 23 '17 at 12:11

Community

1
1

answered Oct 14 '14 at 23:19

Bleeding Fingers

6,993
7
46
74

2

Might need to include additional characters for names that include punctuation (such as hyphenated names)? – Peter Gibson Oct 14 '14 at 23:23
@BleedingFingers: That covers hyphens, but it doesn't work for apostrophes, periods, or, maybe more seriously, spaces, and `O'Donnell, Sinead Q.` is a perfectly valid name. Meanwhile, it catches things that can't be part of names, like underscores. I think it's a lot simpler to blacklist separator characters—comma, tab, and nothing else, in the OP's input—instead of whitelisting name characters. But if you want to do it that way, you can't just assume that people's names have the same rules as Python identifiers. – abarnert Oct 15 '14 at 01:21

skrrgwasme · Answer 3 · 2014-10-15T03:02:18.243

0

The other answers here will definitely work for you, but here's a simpler way to accomplish your task:

# we can open both the input and output files at the same time
with open('ppl.txt', 'r') as fi, open('output.txt', 'w') as fo:
    for line in fi:
        split_line = line.split()
        fo.write("{0}, {1}\n".format(split_line[1], split_line[0].strip(',')))
        # if using Python 3, remove the numbers from the curly brackets

If you don't like magic numbers, you can add the itemgetter module:

import operator
retriever = operator.itemgetter(1, 0)

with open('ppl.txt', 'r') as fi, open('output.txt', 'w') as fo:
    for line in fi:
        f_name, l_name = retriever(line.split())
        fo.write("{0}, {1}\n".format(f_name, l_name.strip(',')))

edited Oct 15 '14 at 03:02

answered Oct 14 '14 at 23:38

skrrgwasme

9,358
11
54
84

While this method was quick, it didn't overwrite the names in a column. Also the lastname is fused with the name, starting with the fomer: JohnSmith, ChristJesus, LesterMoe – Senethys Oct 15 '14 at 02:00
@Senethys Sorry- that was my mistake. I forgot to put a newline character in the strings. You should be much happier with the output now. – skrrgwasme Oct 15 '14 at 03:03

Why keep on converting from string to list back and forth?

3 Answers3