I have a huge Data frame which has 3M records which has column called description. Also I have possible sub string set of around 5k.
I want to get the rows in which the description contains any of the sub string.
i used the following looping
for i in range(0,len(searchstring)):
ss=searchsting[i]
for k in range(0,len(df)):
desc=df['description'].iloc[k].lower()
if (bool(re.search(ss,desc))):
trans.append(df.iloc[k])
The issue is it is taking too much time as the search 5k times 3M looping.
Is there any better way to search substring?