0

I have this example dataframe:

vocab_list = ['running','sitting','stand','walk']
col_list = ['browse','wander','saunter','jogging','prancing']

df = pd.DataFrame(vocab_list,columns=['vocab'])
df.set_index('vocab',inplace=True)
df = df.reindex(col_list,axis=1)

enter image description here

I need to apply a user-defined function to all columns using values from the index of the dataframe.

Taking my user-defined function to be the cosine similarity between pairs of strings in indices and columns

import spacy
nlp = spacy.load('en_core_web_lg')

from pandarallel import pandarallel  
pandarallel.initialize(progress_bar=True)

def func(col): 
  print(col.name) # Will print the strings in vocab_list in each call
  print(col.index) # Will print an Index object containing the names of columns
  doc = nlp(col.name)
  for i,ind in tqdm(enumerate(col.index),leave=False):
    user = nlp(ind)
    check_lemma = doc[0].lemma_ != user[0].lemma_
    pos_equality = doc[0].pos_ == user[0].pos_
    if check_lemma==True and pos_equality==True:
      col.iloc[i] = doc.similarity(user)
    else:
      col.iloc[i] = 0
  return col

df = df.parallel_apply(lambda col: func(col), axis=1)

Is there a way to do this without having a for loop in the user_defined function?

The col in the function is a Series object made from the column, I can access the index string of the row by col.name .

Also, col.index gives me an Index object for this Series, containing the names of the columns, but how do I go from there to get the similarities without having a for loop?

NOTE: My actual dataframe has ~3000 columns and ~120000 indices so I would prefer to not have a for loop within the user-defined function.

EDIT: I have edited the question with the user-defined function currently being used.

0 Answers0