2

I wrote a utility to scan a text file for all space delimited fields that contain alpha characters, it works great but is very slow because I am splitting every line into words and scanning each word, is there a faster way to do this?

Thanks.

Here is the code:

#!/bin/python
import argparse
import sys
import time

parser = argparse.ArgumentParser(description='Find all alpha characters in 
an input file')
parser.add_argument('file', type=argparse.FileType('r'), 
help='filename.txt')

args = parser.parse_args()

def letters(input):
    output = []
    for character in input:
        if character.isalpha():
            output = input
    return output

def main(argv):

    start = time.time()
    fname = sys.argv[1]

    f = open(fname)
    for line in f:
        words = line.rstrip().split()
        for word in words:
            alphaWord = letters(word)
            if alphaWord:
                print(alphaWord)
    f.close()

    end = time.time()
    elapsed = end - start
    print "%s secs elapsed" % elapsed

if __name__ == "__main__":
    main(sys.argv)
iheartcpp
  • 371
  • 1
  • 5
  • 14

2 Answers2

5

Your program has a bug in letters():

def letters(input):
    output = []
    for character in input:
        if character.isalpha():
            output = input # after we get here we'll keep iterating
                           # even though the result will not change
    return output

what you're doing is iterating all the letters and even if one of them is alphanumeric, you'll save the input into output but you'll also continue iterating the rest of the characters - which doesn't add anything.

Either you want to check all the characters (and then the program returns the wrong result) or, if the program returns correct result, you probably want to break after the line: output = input.

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
0
for character in input:
   if character.isalpha(): 
         output = input 
   return output

This doesn't return the word it only returns one letter because it doesn't add to the array

Even if did return a word it would be stripping numbers out of words so if you have something like "12ab34" it will still count it as a word but if that's your intention then it's fine

124697
  • 22,097
  • 68
  • 188
  • 315