What is the best way to apply a row-wise function and create multiple new columns?
I have two dataframes and a working code, but it's most likely not optimal
df1 (dataframe has thousands of rows and xx number of columns)
sic | data1 | data2 | data3 | data4 | data5 |
---|---|---|---|---|---|
5 | 0.90783598 | 0.84722083 | 0.47149924 | 0.98724123 | 0.50654476 |
6 | 0.53442684 | 0.59730371 | 0.92486887 | 0.61531646 | 0.62784041 |
3 | 0.56806423 | 0.09619383 | 0.33846097 | 0.71878313 | 0.96316724 |
8 | 0.86933042 | 0.64965755 | 0.94549745 | 0.08866519 | 0.92156389 |
12 | 0.651328 | 0.37193774 | 0.9679044 | 0.36898991 | 0.15161838 |
6 | 0.24555531 | 0.50195983 | 0.79114578 | 0.9290596 | 0.10672607 |
df2 (column header maps to the sic-code in df1. There are in total 12 sic-codes and the dataframe is thousands of rows long)
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
c_bar | 0.4955329 | 0.92970292 | 0.68049726 | 0.91325006 | 0.55578465 | 0.78056519 | 0.53954711 | 0.90335326 | 0.93986402 | 0.0204794 | 0.51575764 | 0.61144255 |
a1_bar | 0.75781444 | 0.81052669 | 0.99910449 | 0.62181902 | 0.11797144 | 0.40031316 | 0.08561665 | 0.35296894 | 0.14445697 | 0.93799762 | 0.80641802 | 0.31379671 |
a2_bar | 0.41432552 | 0.36313911 | 0.13091618 | 0.39251953 | 0.66249636 | 0.31221897 | 0.15988528 | 0.1620938 | 0.55143589 | 0.66571044 | 0.68198944 | 0.23806947 |
a3_bar | 0.38918855 | 0.83689178 | 0.15838139 | 0.39943204 | 0.48615188 | 0.06299899 | 0.86343819 | 0.47975619 | 0.05300611 | 0.15080875 | 0.73088725 | 0.3500239 |
a4_bar | 0.47201384 | 0.90874121 | 0.50417142 | 0.70047698 | 0.24820601 | 0.34302454 | 0.4650635 | 0.0992668 | 0.55142391 | 0.82947194 | 0.28251699 | 0.53170308 |
I achieved the result I need with the following code:
ind_list = np.arange(1,13) # Create list of industries
def c_bar(row):
for i in ind_list:
if row['sic'] == i:
return mlev_mean.loc['const',i]
def a1_bar(row):
for i in ind_list:
if row['sic'] == i:
return mlev_mean.loc['a1bar',i]
def a2_bar(row):
for i in ind_list:
if row['sic'] == i:
return mlev_mean.loc['a2bar',i]
def a3_bar(row):
for i in ind_list:
if row['sic'] == i:
return mlev_mean.loc['a3bar',i]
def a4_bar(row):
for i in ind_list:
if row['sic'] == i:
return mlev_mean.loc['a4bar',i]
mlev_merge['c_bar'] = mlev_merge.apply(c_bar, axis=1, result_type='expand')
mlev_merge['a1_bar'] = mlev_merge.apply(a1_bar, axis=1, result_type='expand')
mlev_merge['a2_bar'] = mlev_merge.apply(a2_bar, axis=1, result_type='expand')
mlev_merge['a3_bar'] = mlev_merge.apply(a3_bar, axis=1, result_type='expand')
mlev_merge['a4_bar'] = mlev_merge.apply(a4_bar, axis=1, result_type='expand')
The output is something like this:
sic | data1 | data2 | data3 | data4 | c_bar | a1_bar | a2_bar | a3_bar | a4_bar |
---|---|---|---|---|---|---|---|---|---|
5 | 0.10316948 | 0.61408639 | 0.04042675 | 0.79255749 | 0.56357931 | 0.42920472 | 0.20701581 | 0.67639811 | 0.37778029 |
6 | 0.5730904 | 0.16753145 | 0.27835136 | 0.00178992 | 0.51793793 | 0.06772307 | 0.15084885 | 0.12451806 | 0.33114948 |
3 | 0.87710893 | 0.66834187 | 0.14286608 | 0.12609769 | 0.75873957 | 0.72586804 | 0.6081763 | 0.14598001 | 0.21557266 |
8 | 0.24565579 | 0.56195558 | 0.93316676 | 0.20988936 | 0.67404545 | 0.65221594 | 0.79758557 | 0.67093021 | 0.33400764 |
12 | 0.79703344 | 0.61066111 | 0.94602909 | 0.56218703 | 0.92384307 | 0.30836159 | 0.72521994 | 0.00795362 | 0.76348227 |
6 | 0.86604791 | 0.28454782 | 0.97229172 | 0.21853932 | 0.75650652 | 0.40788056 | 0.53233553 | 0.60326386 | 0.27399405 |
Cell values in the example are randomly generated, but the point is to map based on sic-codes and add rows from df2 as new columns into df1.