I want to add two columns to a pandas Dataframe
using a function that gives back a tuple
as such:
data=pd.DataFrame({'a':[1,2,3,4,5,6],'b':['ssdfsdf','bbbbbb','cccccccccccc','ddd','eeeeee','ffffff']})
def givetup(string):
result1 = string[0:3]
# please imagine here a bunch of string functions concatenated.
# including nlp methods with SpaCy
result2 = result1.upper()
# the same here, imagine a bunch of steps to calculate result2 based on result 1
return (result1,result2)
data['c'] = data['b'].apply(lambda x: givetup(x)[0])
data['d'] = data['b'].apply(lambda x: givetup(x)[1])
This is very inefficient (I am dealing with millions of rows) since I call two times the same function and make two calculations.
Since result2
depends on result 1
I better not separate givetup
into two functions
How can I assign in one go result1
and result2
into new columns c and d with only one call to the function?
what is the most efficient way to do it?
Please bear in mind that result1
and result2
are heavily time consuming string calculations.
EDIT 1: I knew about this: Apply pandas function to column to create multiple new columns?
i.e. applying vectorized functions. In my particular case it is highly undesirable or perhaps even impossible. Imagine that result 1 and result 2 are calculated based on language models and I need the plain text.