I have a text file that looks like as follows:
abc
ade
rgh
lss
foxp3
I need to parse it with python into list or a dictionary, so I can get:
list[0] = abc
list[1] = ade
list[3] = rgh
I tried tika, but it gives me a list and all the items in the list are identified by one index.
list[0]=
abc
ade
rgh
here is the code:
new_file_name = 'main_gene_names_3adera.txt'
raw1 = parser.from_file(new_file_name)
import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
data1 = raw1['content']
from nltk import sent_tokenize
print(sent_tokenize(data1))#
b1 = sent_tokenize(data1)
#print(b1[0])