0

I have a very long file (about 2*10^5 rows times 5 columns) filled with numbers (floats).

I have to find the maximum value among the numbers of the first column and then consider the corresponding numbers on the other four columns on the same line.

I thought I might use use a dictionary: the keys are the number in the first column, the values are a list containing the others. I find the maximum among the keys and read the corresponding value.

Is there a smarter way? That dictionary is going to be very big...

I, almost forgot: I use python 2.6.

mattiav27
  • 655
  • 2
  • 9
  • 27
  • How many times do you have to repeat this for each input file? If once, why not just scan through the file, retaining the best row? – Hugh Bothwell Feb 27 '14 at 17:03
  • I have to repeat only once. – mattiav27 Feb 27 '14 at 17:05
  • This sounds like the situation that I ran into with a csv file http://stackoverflow.com/questions/21731270/opening-a-large-json-file-in-python-with-no-newlines-for-csv-conversion-python-2 which I then use to develop complete numpy statistics. If you only need to identify the row as a read and can discard the rest of the data, then it only requires a single run through. – sabbahillel Feb 27 '14 at 17:09

3 Answers3

0
maxn=-float('inf')
with open(fname) as f:
    for line in f:
        if maxn<int(line.split(',')[0]):
            theLine=line

#do something with that line:
print theLine
zhangxaochen
  • 32,744
  • 15
  • 77
  • 108
0
# define a sorting function based on the first numer, assuimg columns are
# separated by space or tab
f = lambda line: float(line.split()[0])
# opened file in Python is an iterator, so could be served to max() directly
with open('your_input_file') as inf:
    line_with_max_num = max(inf, key=f)
# turn the other four numbers into a list and print them to the screen
# or do whatever you like with them
print [float(_) for _ in line_with_max_num.split()[1:]]
zyxue
  • 7,904
  • 5
  • 48
  • 74
0
INPUT = "myfile.txt"
DELIM = ","

def first_float(s):
    first = s.split(DELIM, 1)[0]
    return float(first)

with open(INPUT) as inf:
    max_line = max(inf, key=first_float)
    max_data = [float(f) for f in max_line.split(DELIM)]
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99