Keep track of the data in a dictionary of this format:
data = {
ID: [value, 'string'],
}
As you read each line from the file, see if that ID is already in the dict. If not, add it; if it is, and the current ID is bigger, replace it in the dict.
At the end, your dict should have every biggest ID.
# init to empty dict
data = {}
# open the input file
with open('file.txt', 'r') as fp:
# read each line
for line in fp:
# grab ID, value, string
item_id, item_value, item_string = line.split()
# convert ID and value to integers
item_id = int(item_id)
item_value = int(item_value)
# if ID is not in the dict at all, or if the value we just read
# is bigger, use the current values
if item_id not in data or item_value > data[item_id][0]:
data[item_id] = [item_value, item_string]
for item_id in data:
print item_id, data[item_id][0], data[item_id][1]
Dictionaries don't enforce any specific ordering of their contents, so at the end of your program when you get the data back out of the dict, it might not be in the same order as the original file (i.e. you might see ID 2 first, followed by ID 1).
If this matters to you, you can use an OrderedDict
, which retains the original insertion order of the elements.
(Did you have something specific in mind when you said "read by chunks"? If you meant a specific number of bytes, then you might run into issues if a chunk boundary happens to fall in the middle of a word...)