I am currently trying to put together a python script to compare two text files (tab-separated values). The smaller file consists of one field per record of key values (e.g. much like a database primary key), whereas the larger file is comprised of a first-field key, up to thousands of fields per record, with tens of thousands of records.
I am trying to select (from the larger file) only the records which match their corresponding key in the smaller file, and output these to a new text file. The keys occur in the first field of each record.
I have hit a wall. Admittedly, I have been trying for loops, and thus far have had minimal success. I got it to display the key values of each file--a small victory!
I may be a glutton for punishment, as I am bent on using python (2.7) to solve this, rather than import it into something SQL based; I will never learn otherwise!
UPDATE: I have the following code thus far. Is the use of forward-slash correct for the write statement?
# Defining some counters, and setting them to zero.
counter_one = 0
counter_two = 0
counter_three = 0
counter_four = 0
# Defining a couple arrays for sorting purposes.
array_one = []
array_two = []
# This module opens the list of records to be selected.
with open("c:\lines_to_parse.txt") as f0:
LTPlines = f0.readlines()
for i, line in enumerate(LTPlines):
returned_line = line.split()
array_one.append(returned_line)
for line in array_one:
counter_one = counter_one + 1
# This module opens the file to be trimmed as an array.
with open('c:\target_data.txt') as f1:
targetlines = f1.readlines()
for i, line in enumerate(targetlines):
array_two.append(line.split())
for line in array_two:
counter_two = counter_two + 1
# The last module performs a logical check
# of the data and writes to a tertiary file.
with open("c:/research/results", 'w') as f2:
while counter_three <= 3: #****Arbitrarily set, to test if the program will work.
if array_one[counter_three][0] == array_two[counter_four][0]:
f2.write(str(array_two[counter_four]))
counter_three = (counter_three + 1)
counter_four = (counter_four + 1)
else:
counter_four = (counter_four + 1)