-1

How does one go about a for loop and changing the entries in the list? Like, the equivalent of [line.strip() for line in lines], except that line.strip() is saved back to the lines list.

This is needed because I am using a large dataset, stored in the list. This dataset will be used to for machine training to train a classifier. Specifically, natural language processing using the NLTK lib. Hence, duplication of the array is not feasible.

jhtong
  • 1,529
  • 4
  • 23
  • 34

4 Answers4

1

How about this:

lines = [line.strip() for line in lines]

which creates another list (though it's all stored in memory which may be a problem with a lot of lines).

Alternatively, you could use a generator expression instead to avoid the potential memory problem and just create the lines on demand:

lines = (line.strip() for line in lines)
Levon
  • 138,105
  • 33
  • 200
  • 191
1

Use a generator expression, then all lines won't be held at the same time.

glines = (line.strip() for line in lines)
monkut
  • 42,176
  • 24
  • 124
  • 155
1

Do you want to store the result into the same instance lines (not replace with a new list instance)?

lines[:] = [line.strip() for line in lines]
spatar
  • 550
  • 1
  • 5
  • 15
  • I could be wrong, but doesn't the right hand side of this assignment create a whole new list anyway? – srgerg Jun 01 '12 at 02:20
  • @srgerg, yes, it creates new list first then replaces the whole content of `lines` with contents of this new list. But it is the fastest way to store the result into the same list instance (if talking about arbitrary length lists). – spatar Jun 01 '12 at 02:25
  • 1
    ok, but then why not use a generator expression on the right-hand-side to reduce the memory requirements of constructing the whole list? – srgerg Jun 01 '12 at 02:33
  • @srgerg: The statement with a generator expression might create a copy anyway in CPython (see how `list_ass_*` functions are implemented in listobject.c) – jfs Jun 01 '12 at 04:09
1

If I understand you correctly, you're trying to strip the lines from a file in place rather than creating an entirely new list. The problem is that in Python strings are immutable, so you can't modify a string in place, you must create a new one.

As others have suggested, a generator expression will produce the stripped strings on demand, rather than storing them all in a new list, reducing memory demands. If you really want to strip the strings in place, then something like this will do the job:

for i, line in enumerate(lines): lines[i] = line.strip()

but it may be that creating a new list would be faster anyway.

Community
  • 1
  • 1
srgerg
  • 18,719
  • 4
  • 57
  • 39
  • Thanks. Answered the question. Wanted to avoid creating another duplicate list, because it will contain test datasets used for machine training and the list might be quite large. Thanks! – jhtong Jun 01 '12 at 02:30