So I'm tring to find exact words from country.txt file which is define name of places with a descriptions file below:
here is the example of country.txt
Pic de Font Blanca
Roc Mélé
Pic des Langounelles
Pic de les Abelletes
Estany de les Abelletes
Port Vieux de la Coume d’Ose
Port de la Cabanette
Port Dret
Costa de Xurius
Font de la Xona
and here is a description.csv description file
Descriptions file is a list of data that contains titles and descriptions of the article. What I am trying to do is to find exact words of place name from descriptions file with country.txt file
code.py
import csv
import time
import re
allCities = open('country.txt', encoding="utf8").readlines()
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
with open('description.csv') as descriptions,open('desc_place7---' + str(timestr) + '.csv', 'w', newline='', encoding='utf-8') as output:
descriptions_reader = csv.DictReader(descriptions)
fieldnames = ['title', 'description', 'place']
output_writer = csv.DictWriter(output, delimiter='|', fieldnames=fieldnames)
output_writer.writeheader()
line=0
pattern = r'|'.join(r'\b{}\b'.format(re.escape(city.strip())) for city in sorted(allCities, key=len, reverse=True))
for eachRow in descriptions_reader:
title = eachRow['row']
description = eachRow['desc']
citiesFound = set()
found = re.findall(pattern, description, re.IGNORECASE | re.MULTILINE)
citiesFound.update(found)
if len(citiesFound)==0:
output_writer.writerow({'title': title, 'description': description, 'place': " - "})
else:
output_writer.writerow({'title': title, 'description': description, 'place': " , ".join(citiesFound)})
line += 1
print(line)
expected output: output
But because country.txt(185.94MB) is a large file, so my code can't fully run. It makes my laptop freeze. Is there a good way to handle this? I think its also because of the pattern line I have makes low performance but I also need a regex to find exact words