40

I am trying to parse about 20 million lines from a text file and am looking for a way to do some further manipulations on lines that do not start with question marks. I would like a solution that does not use regex matching. What I would like to do is something like this:

for line in x:
    header = line.startswith('?')
if line.startswith() != header:
        DO SOME STUFF HERE

I realize the startswith method takes one argument, but is there any simple solution to get all lines from a line that DO NOT start with a question mark?

starball
  • 20,030
  • 7
  • 43
  • 238
drbunsen
  • 10,139
  • 21
  • 66
  • 94

4 Answers4

72

Use generator expressions, the best way I think.

for line in (line for line in x if not line.startswith('?')):
    DO_STUFF

Or your way:

for line in x:
    if line.startswith("?"):
        continue
    DO_STUFF

Or:

for line in x:
    if not line.startswith("?"):
        DO_STUFF

It is really up to your programming style. I prefer the first one, but maybe second one seems simplier. But I don't really like third one because of a lot of indentation.

utdemir
  • 26,532
  • 10
  • 62
  • 81
10

Here is a nice one-liner, which is very close to natural language.

String definition:

StringList = [ '__one', '__two', 'three', 'four' ]

Code which performs the deed:

BetterStringList = [ p for p in StringList if not(p.startswith('__'))]
WalyKu
  • 301
  • 2
  • 8
2

Something like this is probably what you're after:

with open('myfile.txt') as fh:
  for line in fh:
    if line[0] != '?': # strings can be accessed like lists - they're immutable sequences.
      continue
    # All of the processing here when lines don't start with question marks.
g.d.d.c
  • 46,865
  • 9
  • 101
  • 111
0

Similar to utdemir's answer:

from itertools import ifilterfalse  # just "filterfalse" if using Python 3

for line in ifilterfalse(lambda s: s.startswith('?'), lines):
    # DO STUFF

http://docs.python.org/library/itertools.html#itertools.ifilterfalse
http://docs.python.org/dev/py3k/library/itertools.html#itertools.filterfalse

JAB
  • 20,783
  • 6
  • 71
  • 80