I have numerous tsv file containing two columns. First column is made up ofsentences and second column is made of polarity of those sentences. the delimiter is a tabulation. I would like to extract the lines which have a polarity of "0".
I made up this small code but whatsoever it does not work and return 0 sentences.
for d in directory:
print(" directory: ", d)
splits = ['dev1'] #,'test1','train1']
for s in splits:
print(" sous-dir : ", s)
path = os.path.join(indir, d)
with open(os.path.join(path, s+'.tsv'), 'r', encoding='utf-8') as f_in:
next(f_in)
for line in f_in:
if line.split('\t')[1] == 0:
doc = nlp(line.split('\t')[0])
line_split = [sent.text for sent in doc.sents]
for elt in line_split:
sentences_list.append(elt)
print("nombres total de phrases :", len(sentences_list))
Why is line.split('\t')[1] not equal to 0 if line is the string "Je suis levant\t0\n"
ex. of a file
gnfjfklfklf 0
fokgmlmlrfm 1
eoklplrmrml 0
ekemlremeùe 0
I would like to keep line which end with "0"