0

I am trying to merge several time series dataframes into one very large dataframe with a MultiIndex.

Suppose I have these DataFrames.

In [1]: dates = pd.DatetimeIndex(["2019-1-1", "2019-1-2", "2019-1-3"], name="Date")
In [2]: df_a = pd.DataFrame(np.random.randn(3, 2), columns=['Col1', 'Col2'], index=dates)
In [3]: df_b = pd.DataFrame(np.random.randn(3, 2), columns=['Col1', 'Col2'], index=dates)
In [4]: df_c = pd.DataFrame(np.random.randn(3, 2), columns=['Col1', 'Col2'], index=dates)
In [5]: df_a
Out[5]: 

                 Col1        Col2
      Date      
2019-01-01   1.317679   -1.201769
2019-01-02  -0.991833    0.626420
2019-01-03   0.549733    1.942215

Now, I've created the scafolding for the dataframe that I want. It looks like this.

In [6]: stock_symbols = ["A", "B", "C"]
In [7]: index = pd.MultiIndex.from_product([dates, stock_symbols], names=["Date", "Script"])
In [8]: df = pd.DataFrame(columns=['Col1', 'Col2'], index=index)
In [9]: df
Out[9]:

                     Col1   Col2
      Date  Script      
2019-01-01       A    NaN    NaN
                 B    NaN    NaN
                 C    NaN    NaN
2019-01-02       A    NaN    NaN
                 B    NaN    NaN
                 C    NaN    NaN
2019-01-03       A    NaN    NaN
                 B    NaN    NaN
                 C    NaN    NaN

How do I specify to Pandas that values from df_a are to be appended to the appropriate index position? I figured I'd have to use a .join() but since the values of Script don't occur in the DataFrames I don't know what to do.

Please help.

Maaz Basar
  • 13
  • 3
  • 1
    Please see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). It's far easier to understand the problem if you create some code that sets up an example dataframe and shows your expected output. – help-ukraine-now Aug 15 '19 at 07:33

1 Answers1

0

Okay, so currently I'm working with this bit of code.

idx = pd.IndexSlice
df.loc[idx[:, "A"], :] = df.loc[idx[:, "A"], :].fillna(df_a)
df.loc[idx[:, "B"], :] = df.loc[idx[:, "B"], :].fillna(df_b)
df.loc[idx[:, "C"], :] = df.loc[idx[:, "C"], :].fillna(df_c)

If anyone has a better way of doing this... I'm all ears!

Maaz Basar
  • 13
  • 3