I have a function which split a string into words and then finds the word in a dataframe, if it finds it then it search for that row using for loop, which i dont want to do as it make it too slow with large dataset. i want to use row[value], and dont want to loop through whole df for each matching word.
I am new to python,and i have searched alot for it but could get what i wanted, i found the index.tolist() but i dont want to make a list, i just need the index of the first matching value.
any help or work around would be appreciated.
def cal_nega_mean(my_string):
mean = 0.00
mean_tot = 0
mean_sum = 0.00
for word in my_string.split():
if word in df.values: #at this point if it founds then get index, so that i dont have to use for loop in next line
for index, row in df.iterrows(): #want to change
if word == row.word: # this part
if row['value'] < -0.40:
mean_tot += 1
mean += row['value']
break
if mean_tot == 0:
return 0
mean = mean_sum / mean_tot
return round(mean,2)
example string input, there are more than 300k strings
my_string = "i have a problem with my python code"
cal_nega_mean(my_string)
# and i am using this to get return for all records
df_tweets['intensity'] = df_tweets['tweets'].apply(lambda row: cal_nega_mean(row))
dataframe to search from
df
index word value ...
1 python -0.56
2 problem -0.78
3 alpha -0.91
. . .
9000 last -0.41