So i have a big text file , around 900 MB , I want to read the file line by line , and for each line , do find and replace , based on items on a list of phrases , let's take up a hypothetical situation
Let's say that I have a single .txt file containing all wikipedia in plaintext.
I have a python list of phrases , call it P , P = ['hello world','twently three' ,'any bigram','any trigram' ] , all items in P are phrases ( no single word exists)
Given this list P , I am trying to scan the .txt file , line by line and using P , check if any of P's items are existing in current line and if they do exist replace space between words with _ , for example if current line says : "hello world twently three any text goes here" , it should replace it like : "hello_world twently_three any text goes here" the length of P is 14,000
I have implemented this in python , and it is very slow , it can only perform this on average rate of about 5,000 lines / minute , the .txt file is huge with millions of lines , is there any efficient way of doing this ? Thanks
Update :
with open("/media/saurabh/New Volume/wikiextractor/output/Final_Txt/single_cs.txt") as infile:
for index,line in enumerate(infile):
for concept_phrase in concepts:
line = line.replace(concept_phrase, concept_phrase.replace(' ', '_'))
with open('/media/saurabh/New Volume/wikiextractor/output/Final_Txt/single_cs_final.txt', 'a') as file:
file.write(line + '\n' )
print (index)