1

Suppose you have two separate pandas DataFrames with the same row and column indices (in my case, the column indices were constructed by .unstack()'ing a MultiIndex built using df.groupby([col1,col2]))

df1 = pd.DataFrame({'a':[.01,.02,.03],'b':[.04,.05,.06]})
df2 = pd.DataFrame({'a':[.04,.05,.06],'b':[.01,.02,.03]})

Now suppose I would like to create a 3rd DataFrame, df3, where each entry of df3 is a string which uses the corresponding element-wise entries of df1 and df2. For example,

df3.iloc[0,0] = '{:.0%}'.format(df1.iloc[0,0]) + '\n' + '{:.0%}'.format(df2.iloc[0,0])

I recognize this is probably easy enough to do by looping over all entries in df1 and df2 and creating a new entry in df3 based on these values (which can be slow for large DataFrames), or even by joining the two DataFrames together (which may require renaming columns), but I am wondering if there a more pythonic / pandorable way of accomplishing this, possibly using applymap or some other built-in pandas function?

The question is similar to Combine two columns of text in dataframe in pandas/python but the previous question does not consider combining multiple DataFrames into a single.

user7101631
  • 51
  • 1
  • 4

2 Answers2

0

IIUC, you just need add df1 and df2 with '\n'

df3 = df1.astype(str) + '\n' + df2.astype(str)

Out[535]:
            a           b
0  0.01\n0.04  0.04\n0.01
1  0.02\n0.05  0.05\n0.02
2  0.03\n0.06  0.06\n0.03
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • I keep getting error `TypeError: Cannot cast array data from dtype('float64') to dtype(' – user7101631 Nov 20 '19 at 15:54
  • I have no experience with `Databricks`, so I couldn't help you on this error. However, you may check if there is any `NaN` in the output of `astype(str)` as `df1.astype(str).isna().all(None)` and doing the same for `df2` – Andy L. Nov 20 '19 at 18:01
  • I don't think the platform (Databricks) makes a difference here, and none of the entries any any of the dfs are None/Null, so I am still stumped. I'll continue to try to work through it! – user7101631 Nov 20 '19 at 21:20
0

You can make use of the vectorized operations of Pandas (given that the dataframes share row and column index)

(df1 * 100).astype(str) + '%\n' + (df2 * 100).astype(str) + '%' 

You get

    a           b
0   1.0%\n4.0%  4.0%\n1.0%
1   2.0%\n5.0%  5.0%\n2.0%
2   3.0%\n6.0%  6.0%\n3.0%
Vaishali
  • 37,545
  • 5
  • 58
  • 86