0

I want to build two lists from a document that may vary in formatting but should roughly be two columns with some separator. each row is :

"word1"\t"word2"

for example. My lists should be "list_of_word1", "list_of_word2". I want to build them at once. I know that I could use pandas, but for some reason (the script should be able to work without specific import, only on general library), I also need to use regular document opening.

My attempt was:

list_of_word1=[]
list_of_word2=[]
((list_of_word1.extend(line.split()[0]),list_of_word2.extend(line.split()[1])) for line in open(doc))

The generator doesn't serve any purpose since extend returns None, so that may be seen as bad to use a form that won't be reused there or that might be unnecessary in the first place. Plus, I would like to know how to avoid to have to reuse the split function, that's "ok" for 2 times per line, but if I was to use the same principle on more columns, it would become very unefficient.

My try to avoid reuse split was to make it like this:

((list_of_word1.extend(linesplit0),list_of_word2.extend(linesplit1)) for line in open(doc) for (linesplit0,linesplit1) in line.split("\t"))

but that indeed doesn't work, since it doesn't find tuples to unpack. i also tried starred unpacking but that's not working.

((list_of_word1.extend(linesplit0),list_of_word2.extend(linesplit1)) for linesplit0,linesplit1 in open(doc).readline().split("\n").split("\t"))

But that somehow feels unsatisfactory, too contrived. What do you think?

Ando Jurai
  • 1,003
  • 2
  • 14
  • 29

4 Answers4

1

Maybe this?

lists = [[] for i in range(<number_of_lists>)]
[[z[0] + [z[1]] for z in zip(lists, line.split())] for line in open(doc)]

(might need some tweaking)

Sufian Latif
  • 13,086
  • 3
  • 33
  • 70
1

This answer will work regardless of the delimiter used (provided it is some number of spaces!)

with open('temp.txt','r') as f:
    data = f.read().strip('\n').split('\n')

dataNoSpace = [filter(lambda a: a!= '', i.split(' ')) for i in data]
list1, list2 = [list(i) for i in zip(*dataNoSpace)]

For example, if 'temp.txt' is:

word10 word20
word11    word21
word12       word22
word13  word23
word14    word24

We get:

list1
['word10', 'word11', 'word12', 'word13', 'word14']

list2
['word20', 'word21', 'word22', 'word23', 'word24']
Robbie
  • 4,672
  • 1
  • 19
  • 24
  • That's neat, but you end looping two times on your dataclean, which I want to avoid. I would do the same anyway, but I try the most to avoid such things, looping over data structures is computationally unefficient. Plus we would not need to be on the same line by using such form, it is less readable (for me). Still the data preparation is very nice – Ando Jurai Apr 27 '17 at 11:05
  • 1
    Okay check it out now - not sure it can get any cleaner! – Robbie Apr 27 '17 at 11:26
  • That's perfect:) – Ando Jurai Apr 27 '17 at 11:35
  • Great! If this has solved your problem you can mark as correct :-) – Robbie Apr 27 '17 at 11:44
1

You can use zip together with argument unpacking to achieve this.

Example input file data.txt:

1 2 3
apple orange banana
one two three
a b c

Code:

>>> with open('data.txt') as f:
...    list(zip(*(line.split() for line in f)))
... 
[('1', 'apple', 'one', 'a'), ('2', 'orange', 'two', 'b'), ('3', 'banana', 'three', 'c')]

See also:

Community
  • 1
  • 1
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • I think it's like my own answer when I sorted it in my head. But while I like the zip, I wonder about other use case like having to build a list of tuples of selected element of the split, and the problem of split unpacking remains... (something like [(line.split()[0],line.split()[-1]) for line in docread], but i would like to find a way to unpack the split for both element of the initial tuple instead of using split() two times... – Ando Jurai Apr 27 '17 at 11:11
  • 1
    @AndoJurai Sorry, I don't understand what you mean. You should update your question with a specific example input and desired result. – mkrieger1 Apr 27 '17 at 11:17
  • http://stackoverflow.com/questions/43656861/unpacking-a-split-inside-a-list-comprehension – Ando Jurai Apr 27 '17 at 11:49
0

Actually at first I wanted to use zip, hence the generator. But I mixed up things and ended up adding
list_of_word1=[] list_of_word2=[]

which are useless like that. What should be done would be:

list_of_word1,list_of_word2=zip(*((line.split()) for line in open(doc)))

That works like a charm. Still the fundamental problem remains, while I could do what I wanted, I still have the problem of not knowing how to do If I have to manage a split unpacking in a comprehension. if you have any idea...?

Ando Jurai
  • 1,003
  • 2
  • 14
  • 29