1

I am making a script that extract informations in a cvs file. Each columns is separated by ";". The output should be a list of strings containing the column i want to extract.

I want to do this with comprehensive list, I would like to do something like :

[ c[1] for c as l.split(";") in for l in lines ]

And if you know Python you'll guess that it doesn't work. How could I achieve something like that ?

Of course I could use [ l.split(";") for l in lines ] but in fact I need to extract several columns so doing multiple split isn't the right choice.

File looks like :

115239747;darwin;simone;simone@gmail.com;678954312
112658043;de beauvoir;charles;charles@laposte.net;745832259
115831259;ramanujan;godfrey;godfrey@etu.univ.fr;666443810
114873956;hardy;srinivasa;srini@hotmail.com;659332891
114823401;germain;marguerite;marg@etu.univ.fr;768532870
115821145;yourcenar;sophie;sophie@gmail.com;645388521
114560013;harendt;michel;micha@etu.univ.fr;666458200
115702831;foucault;hannah;ha@laposte.net;691337456

And i'd like to extract second and third columns.

Edit: I wan't to only use Python language features (no cvs library) because it is for a beginner course about Python. Thank you.

5 Answers5

4

Updated answer due to updated question:

>>> import csv
>>> from operator import itemgetter
>>> 
>>> cols = [1,2] # list all the columns you want here
>>> with open('testfile') as f:
...     ig = itemgetter(*cols)
...     result = [ig(row) for row in csv.reader(f, delimiter=';')]
... 
>>> result
[('darwin', 'simone'), ('de beauvoir', 'charles'), ('ramanujan', 'godfrey'), ('hardy', 'srinivasa'), ('germain', 'marguerite'), ('yourcenar', 'sophie'), ('harendt', 'michel'), ('foucault', 'hannah')]

Without imports:

>>> cols = [1,2] # list all the columns you want here
>>> with open('testfile') as f:
...     split_lines = [line.split(';') for line in f]
...     result = [[line[col] for col in cols] for line in split_lines]
... 
>>> result
[['darwin', 'simone'], ['de beauvoir', 'charles'], ['ramanujan', 'godfrey'], ['hardy', 'srinivasa'], ['germain', 'marguerite'], ['yourcenar', 'sophie'], ['harendt', 'michel'], ['foucault', 'hannah']]
timgeb
  • 76,762
  • 20
  • 123
  • 145
  • Sorry, no cvs is allowed ! It's for a beginner course in Python and we're using only core features... – Nicolas Scotto Di Perto Feb 13 '16 at 16:14
  • @NicolasScottoDiPerto added a solution without imports – timgeb Feb 13 '16 at 16:15
  • I take it ! Thank you that's nice I didn't know I could iterate on lines like that. But I am still looking for a way to alias in a comprehensive list. Is it possible ? I find this alternative pretty readable. – Nicolas Scotto Di Perto Feb 13 '16 at 16:19
  • @NicolasScottoDiPerto Your terms are unclear. What do you mean by alias? There's no such thing as a comprehensive list in python. If you mean a list comprehension, the first version uses one and the second version uses two of those (or three if you count the nested one). – timgeb Feb 13 '16 at 16:35
  • @NicolasScottoDiPerto ah, I think your question is whether you can define a name inside a list comprehension - no, that's not possible. – timgeb Feb 13 '16 at 16:38
  • Ok but as you demonstarte the work can be done upstream. Thank you! – Nicolas Scotto Di Perto Feb 13 '16 at 16:43
1

Since this is a CSV file you need to read, why don't use the csv module:

import csv

with open('file.csv') as csvfile:
    reader = csv.reader(csvfile, delimiter=";")
    for row in reader:
        print(row)
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

[l.split(";")[1] for l in lines ]

Forge
  • 6,538
  • 6
  • 44
  • 64
0

Like this?

text = "1;2;3\n4;5;6\n;7;8;9"

col = 1 # for column 1

L = [row.split(";")[col] for row in [line for line in text.split('\n')]]

print(L)
['2', '5', '7']
root-11
  • 1,727
  • 1
  • 19
  • 33
  • It's not exactly this, I have edited my post to be clearer. The key feature I am looking for is how to alias line.split in the comprehension list so that I can access split only one time and access it several times. – Nicolas Scotto Di Perto Feb 13 '16 at 16:06
0

If you want to get subselection from split data, two approaches are possible:

You may use slice syntax for simple cases.

[l.split(";")[1:3] for l in lines]  # will retrieve data from [1,3) range - effectively 1 and 2

For more complex cases operator.itergetter is a way to go.

Return a callable object that fetches item from its operand using the operand’s __getitem__() method. If multiple items are specified, returns a tuple of lookup values. For example:

import operator
[operator.itemgetter(1,2)(l.split(";")) for l in lines]  # you explicitly pick data with indices 1, 2
Community
  • 1
  • 1
Łukasz Rogalski
  • 22,092
  • 8
  • 59
  • 93