1

I have created a function I would like to apply over a given dataframe column. Is there an apply function so that I can create a new column and apply my created function? Example code:

dat = pd.DataFrame({'title': ['cat', 'dog', 'lion','turtle']})

Manual method that works:

print(calc_similarity(chosen_article,str(df['title'][1]),model_word2vec))
print(calc_similarity(chosen_article,str(df['title'][2]),model_word2vec))

Attempt to apply over dataframe column:

dat['similarity']= calc_similarity(chosen_article, str(df['title']), model_word2vec)

The issue I have been running into is that the function outputs the same result over the entirety of the newly created column.

I have tried apply() as follows:

dat['similarity'] = dat['title'].apply(lambda x: calc_similarity(chosen_article, str(x), model_word2vec))

and

dat['similarity'] = dat['title'].astype(str).apply(lambda x: calc_similarity(chosen_article, x, model_word2vec))

Which result in a ZeroDivisionError which i am not understanding since I am not passing empty strings

Function being used:

def calc_similarity(input1, input2, vectors):
    s1words = set(vocab_check(vectors, input1.split()))
    s2words = set(vocab_check(vectors, input2.split()))
    
    output = vectors.n_similarity(s1words, s2words)
    
    return output
DYZ
  • 55,249
  • 10
  • 64
  • 93
Zachqwerty
  • 85
  • 6
  • 3
    `dat.title.apply(...)`? – Barmar Jul 05 '22 at 13:46
  • 1
    Does you function take a vector as input? return a vector? If this is a scalar it is normal to have a single value – mozway Jul 05 '22 at 13:48
  • 1
    `dat['similarity'] = dat['title'].astype(str).apply(lambda x: calc_similarity(chosen_article, x, model_word2vec))` – mozway Jul 05 '22 at 13:49
  • @mozway the function does take a vector input. I have it defined as `def calc_similarity(input1, input2, vectors): s1words = set(vocab_check(vectors, input1.split())) s2words = set(vocab_check(vectors, input2.split())) output = vectors.n_similarity(s1words, s2words) return output` . Would that affect the approach on how I would use .apply()? I am getting a keyerror within the proposed solution – Zachqwerty Jul 05 '22 at 14:02
  • Does this answer your question? [How can I use the apply() function for a single column?](https://stackoverflow.com/questions/34962104/how-can-i-use-the-apply-function-for-a-single-column) – Yaakov Bressler Jul 05 '22 at 14:03
  • 2
    please [edit](https://stackoverflow.com/posts/72870529/edit) the question with the details – mozway Jul 05 '22 at 14:03
  • @YaakovBressler I believe the apply() function is the solution. I am having issues on implementing into my code though. I dont quite understand why it has not been working – Zachqwerty Jul 05 '22 at 14:06

1 Answers1

1

It sounds like you are having difficulty applying a function while passing additional keyword arguments. Here's how you can execute that:

# By default, function will use values for first arg.
# You can specify kwargs in the apply method though
df['similarity'] = df['title'].apply(
    calc_similarity,
    input2=chosen_article,
    vectors=model_word2vec
)
Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69