I have this example dataframe:
vocab_list = ['running','sitting','stand','walk']
col_list = ['browse','wander','saunter','jogging','prancing']
df = pd.DataFrame(vocab_list,columns=['vocab'])
df.set_index('vocab',inplace=True)
df = df.reindex(col_list,axis=1)
I need to apply a user-defined function to all columns using values from the index of the dataframe.
Taking my user-defined function to be the cosine similarity between pairs of strings in indices and columns
import spacy
nlp = spacy.load('en_core_web_lg')
from pandarallel import pandarallel
pandarallel.initialize(progress_bar=True)
def func(col):
print(col.name) # Will print the strings in vocab_list in each call
print(col.index) # Will print an Index object containing the names of columns
doc = nlp(col.name)
for i,ind in tqdm(enumerate(col.index),leave=False):
user = nlp(ind)
check_lemma = doc[0].lemma_ != user[0].lemma_
pos_equality = doc[0].pos_ == user[0].pos_
if check_lemma==True and pos_equality==True:
col.iloc[i] = doc.similarity(user)
else:
col.iloc[i] = 0
return col
df = df.parallel_apply(lambda col: func(col), axis=1)
Is there a way to do this without having a for loop in the user_defined function?
The col
in the function is a Series object made from the column, I can access the index string of the row by col.name
.
Also, col.index
gives me an Index object for this Series, containing the names of the columns, but how do I go from there to get the similarities without having a for loop?
NOTE: My actual dataframe has ~3000 columns and ~120000 indices so I would prefer to not have a for loop within the user-defined function.
EDIT: I have edited the question with the user-defined function currently being used.