I am working on a script in python that I can't seem to get right. It uses two inputs:
- data file
- stop file
The data-file is composed of 4 tab-separated columns which are sorted. The stop file is composed of a list of words also sorted.
The objective of the script is:
- If a string in Column 1 of the data file matches a string in the "stop file," the entire line is deleted.
Here is an example of the data file:
abandonment-n after+n-the+n-a-j stop-n 1
abandonment-n against+n-the+ns leave-n 1
cake-n against+n-the+vg rest-v 1
abandonment-n as+n-a+vd require-v 1
abandonment-n as+n-a-j+vg-up use-v 1
Here is an example of the stop file:
apple-n
banana-n
cake-n
pigeon-n
Here is the code that I have so far:
with open("input1", "rb") as oIndexFile:
for line in oIndexFile:
lemma = line.split()
#print lemma
with open ("input2", "rb") as oSenseFile:
with open("output", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
#print concept
if concept != lemma:
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass
Where the desired output is the following:
abandonment-n after+n-the+n-a-j-stop-n 1
abandonment-n against+n-the+ns-leave-n 1
abandonment-n as+n-a+vd-require-v 1
abandonment-n as+n-a-j+vg-up-use-v 1
Any insight?
As of now the output that I am getting is the following, which is basically just a print out of what I have been doing:
abandonment-n after+n-the+n-a-j stop-n 1
abandonment-n against+n-the+ns leave-n 1
cake-n against+n-the+vg rest-v 1
abandonment-n as+n-a+vd require-v 1
abandonment-n as+n-a-j+vg-up use-v 1
*** Some of the things that I have tried -- and are still not working is:
instead of if concept != lemma:
I first tried if concept not in lemma:
which produces the same output as mentioned before.
I also have the doubt that the function is not calling the first input file, but even with incorporating it in the code: as such:
with open ("input2", "rb") as oSenseFile:
with open("tinput1", "rb") as oIndexFile:
for line in oIndexFile:
lemma = line.split()
with open("out", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
if concept not in lemma:
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass
which produces a blank output file.
I have also tried a different approach as found here:
filename = "input1.txt"
filename2 = "input2.txt"
filename3 = "output1"
def fixup(filename):
fin1 = open(filename)
fin2 = open(filename2, "r")
fout = open(filename3, "w")
for word in filename:
words = word.split()
for line in filename2:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
if True in [concept in line for word in toRemove]:
pass
else:
outstring = '\t'.join(nounsInterest)
fout.write(outstring + '\n')
fin1.close()
fin2.close()
fout.close()
which has been adapted from here, with no success. In this case, the output is not produced at all.
Can someone point me in the direction to where I am going wrong with solving this task? Although the sample files are small, I must run this on a large file. Thank you for any assistance.