I have a list of keywords and I would like to count the number of times each keyword has appeared in an article. The problem is that I have more than half a million articles (in a dataframe format) and I already have a code that produce the desired results. However, it takes around 40-50 seconds to count the instances of all keywords in each article of the dataframe. I am looking for something more efficient in this regard.
I have been using str.count()
command, along with a for
count_matrix= pd.DataFrame(None)
for word in keywords:
count_matrix[str(word)]=df['article'].str.count(word)
The output is exactly as I want, the only problem is that it takes around 40-50 seconds to compute, given the fact that df['article']
has more than half a million articles. Any suggestions to make it more efficient would be highly appreciated.