-2

I have 2 dataframes. One contains the values and the other one contains the date.

df1: enter image description here

df2: enter image description here

I want to create a loop where for each row a new dataframe gets created where the index is the row values (dates) of df2 and in the 1st column there are the values of the row of df1.

My ultimate goal is to plot these dataframes and create a similar plot as below: Grey lines represent the df created by the loop. Is this a good way to plot such data?

enter image description here

Ben
  • 183
  • 8
  • 1
    Welcome to stackoverflow, please read [tour] and [mre] and in this case also: [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Andreas Aug 10 '21 at 13:22
  • Please [do not post images](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) of your data or errors. You can include [code that creates a dataframe such as `df.to_dict()` or the output of `print(df)`](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (include at least the few rows and columns that allow to reproduce the example) – Cimbali Aug 10 '21 at 13:23
  • Do you still need the column names to be available or just a dataframe with two columns, with all values in a seperate row? – Tom S Aug 10 '21 at 13:25
  • Hi @TomS, thanks for the quick reply. I don't need the column names anymore. – Ben Aug 10 '21 at 13:37

2 Answers2

0

If I understand your question correctly I would suggest a stack and a merge:

In my code there is still some uglyness going on with dropping and renaming of some columns, but if you take a closer look at the functions you might be able to clean it up a little.

df1 = pd.DataFrame(
{
    "A": ["A0", "A1", "A2", "A3"],
    "B": ["B0", "B1", "B2", "B3"],
    "C": ["C0", "C1", "C2", "C3"],
    "D": ["D0", "D1", "D2", "D3"],
},
index=[0, 1, 2, 3])
df2 = pd.DataFrame(
{
    "A": ["A4", "A5", "A6", "A7"],
    "B": ["B4", "B5", "B6", "B7"],
    "C": ["C4", "C5", "C6", "C7"],
    "D": ["D4", "D5", "D6", "D7"],
},
index=[0, 1, 2, 3])


df1 = df1.stack().reset_index()
df2 = df2.stack().reset_index()


df3 = df1.merge(df2, how= "outer", on = ['level_0', 'level_1']).drop(columns=['level_0', 'level_1'])
df3.columns = ['date', 'value']
df3 = df3.set_index('date')

print(df3.head())

Will return:

     value
date
A0      A4
B0      B4
C0      C4
D0      D4
A1      A5

Since I do not have your data I helped myself with very generic data input.

Tom S
  • 591
  • 1
  • 5
  • 21
0

I'm not sure which date you want so this solution keeps both and you can decide later...

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.array([[22.89, 23.68, 23.41], [22.41, 22.97, 22.53], [22.03, 22.48, 22.02]]),
                   columns=['CL1', 'CL2', 'CL3'])
df2 = pd.DataFrame(np.array([['1990-01-02','1990-01-22', '1990-01-22', '1990-01-22'], 
                             ['1990-01-03','1990-02-20', '1990-02-20', '1990-02-20'],
                             ['1990-01-04','1990-03-20', '1990-03-20', '1990-03-20']]),
                   columns=['Date', 'CL1', 'CL2', 'CL3'])
display(df1,df2)

    CL1     CL2     CL3
0   22.89   23.68   23.41
1   22.41   22.97   22.53
2   22.03   22.48   22.02

    Date        CL1         CL2         CL3
0   1990-01-02  1990-01-22  1990-01-22  1990-01-22
1   1990-01-03  1990-02-20  1990-02-20  1990-02-20
2   1990-01-04  1990-03-20  1990-03-20  1990-03-20

df1 = df1.stack().reset_index(drop=True)

df2 = df2.set_index('Date')
df2 = df2.stack().reset_index()
df2 = df2.rename(columns={0:'ValueDates', 'level_1':'Label'})

mdf = pd.concat([df2, df1], axis=1)
mdf = mdf.rename(columns={0:'Values'})
display(mdf)

    Date        Label   ValueDates  Values
0   1990-01-02  CL1     1990-01-22  22.89
1   1990-01-02  CL2     1990-01-22  23.68
2   1990-01-02  CL3     1990-01-22  23.41
3   1990-01-03  CL1     1990-02-20  22.41
4   1990-01-03  CL2     1990-02-20  22.97
5   1990-01-03  CL3     1990-02-20  22.53
6   1990-01-04  CL1     1990-03-20  22.03
7   1990-01-04  CL2     1990-03-20  22.48
8   1990-01-04  CL3     1990-03-20  22.02
  • thanks for the answer! My goal is to create a new dataframe for each row, where the dates of df2 (ex: row1) are the index and the values of df1 (ex: row1) are in the 1st column – Ben Aug 10 '21 at 14:17
  • 1
    As long as the dataframes are the same size you could just do something like ```df2 = df2.set_index('Date')``` followed by ```df1.index = df2.index``` – Caleb Sprague Aug 10 '21 at 14:47