0

I have a text file that looks like as follows:

abc
ade
rgh
lss
foxp3

I need to parse it with python into list or a dictionary, so I can get:

list[0] = abc
list[1] = ade
list[3] = rgh

I tried tika, but it gives me a list and all the items in the list are identified by one index.

list[0]= 
          abc
          ade
          rgh

here is the code:

new_file_name = 'main_gene_names_3adera.txt'
raw1 = parser.from_file(new_file_name)

import nltk.data

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
data1 = raw1['content']

from nltk import sent_tokenize

print(sent_tokenize(data1))#
b1 = sent_tokenize(data1)
#print(b1[0])
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
micheled
  • 1
  • 1
  • hello, welcome to SO. Can you show us your code please ? – AlexisG Oct 14 '20 at 13:00
  • ``new_file_name='main_gene_names_3adera.txt' raw1= parser.from_file(new_file_name) import nltk.data tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') data1 = raw1['content'] from nltk import sent_tokenize print (sent_tokenize(data1))# b1=sent_tokenize(data1) print(b1[0]) – micheled Oct 14 '20 at 13:02
  • Are you familiar with the file method [`readlines()`](https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects)? Even simpler you can just do `list(file)`... – Tomerikoo Oct 14 '20 at 13:03
  • And please [edit] that into the question, don't use comments – Tomerikoo Oct 14 '20 at 13:03
  • @Tomerikoo `list(file)` will keep the trailing `\n`s. – bereal Oct 14 '20 at 13:04
  • @bereal well for that matter also `readlines()` will lol – Tomerikoo Oct 14 '20 at 13:07

0 Answers0