1

Instead of passing individual parameters iteratively, I am passing the whole column but still it is taking the same amount of time.

It is taking approx 1 Minute which is very long.....

Here is the code

from fuzzywuzzy import fuzz
import json
import pandas as pd
import time

k = pd.read_csv(r"/home/hamza/Downloads/retrosynthesis1-all.csv")
k.dropna()
word = input(" Enter String : ")
t0 = time.time()

patterns = pd.DataFrame(data=k['Target'])
print(type(patterns))
result = pd.DataFrame(data=k['Reactant'])
w=[word for i in range(len(patterns))]

wd = pd.DataFrame(data=w, columns=['input'])
df = pd.concat([wd, patterns, result], axis=1, sort=False)
print(df)

def get_ratio(row):
    name = row['input']
    name1 = row['Target']
    return fuzz.partial_ratio(name, name1)

res = df[df.apply(get_ratio, axis=1) > 70]
Hamza Shaikh
  • 75
  • 1
  • 8
  • You may try with `difflib` as in https://stackoverflow.com/a/60908516/9698684 – yatu Nov 02 '20 at 07:50
  • It is still taking 2-3 minutes to process the whole dataset. Which is higher than the previous one in terms of execution time. – Hamza Shaikh Nov 04 '20 at 07:40
  • You could try to do this with rapidfuzz, which should be a bit faster https://github.com/maxbachmann/rapidfuzz than fuzzywuzzy. – maxbachmann Nov 06 '20 at 06:44

0 Answers0