Parsing a CSV using python's toolz package

Question

I recently came across the toolz repository and decided to give it a spin.

Unfortunately, I’m having some trouble properly using it, or at least understanding it.

My first simple task for myself was to parse a tab separated TSV file and get the second column entry in it.

For example, given the file foo.tsv:

a    b    c
d    e    f

I’d like to return a list of ['b', 'e']. I successfully achieved that with the following piece of logic

from toolz.curried import *

with open("foo.tsv", 'r') as f:
    data = pipe(f, map(str.rstrip),
                           map(str.split),
                           map(get(1)),
                           tuple)
    print(data)

However, if I change the foo.tsv file to use commas instead of tabs as the column delimiters I cannot seem to figure out the best way to adjust the above code to handle that. It’s not clear to me how to add best a "," argument to the str.split function while using the map with either the pipe or thread_first functions.

Is there already some existing documentation that already describes this?

i would use `lambda x: x.split(',')` rather than `map` – R Nar Nov 12 '15 at 23:59 — R Nar, Nov 12 '15 at 23:59

score 2 · Accepted Answer · answered Nov 13 '15 at 00:01

2

lambdas

Don't be afraid of using lambdas.

map(lambda s: s.split(','))

It's maybe a bit less pretty than map(str.split) but it gets the point across

Use pluck

Consider using pluck(...) rather than map(get(...))

map(get(1)) -> pluck(1)

Use Pandas

If you have a CSV file you might consider just using Pandas, which is very fast and highly optimized for this kind of work.

answered Nov 13 '15 at 00:01

MRocklin

55,641
23
163
235

I see. Thank you for the insight! – indraniel Nov 13 '15 at 00:16

score 0 · Answer 2 · edited May 23 '17 at 11:59

0

Based upon MRocklin's above answer, my CSV parsing code using toolz should look more like:

with open("foo.tsv", 'r') as f:
    data = pipe(f, map(lambda (s): str.rstrip(s, "\n")),
                   map(lambda (s): str.split(s, "\t")),
                   pluck(1),
                   tuple)
    print(data)

edited May 23 '17 at 11:59

Community

1
1

answered Nov 13 '15 at 00:15

indraniel

577
1
4
8

score 0 · Answer 3 · answered Dec 25 '15 at 11:41

0

Your version for the tsv file can be shortened to:

pipe(f, map(str.split), pluck(1), tuple)

To read a comma separated file, use something like this:

pipe(f, map(lambda s: s.split(',')), pluck(1), map(str.strip), tuple)

answered Dec 25 '15 at 11:41

Mike Müller

82,630
20
166
161

Parsing a CSV using python's toolz package

3 Answers3

lambdas

Use pluck

Use Pandas