0

I have two data frames. They have the same structure but they come from two different model. Basically, I would like to compare them in order to find the differences. The first thing that I would like to do is to plot two rows, the first from the first data frames and the second from the other.

This is what I do: I read the two csv file,

PRICES   = pd.read_csv('test_model_1.csv',sep=';',index_col=0, header = 0)
PRICES_B = pd.read_csv('bench_mark.csv',sep=';',index_col=0, header = 0)

then I plot the 8th column of both, as:

rowM  = PRICES.iloc[8]
rowB  = PRICES_B.iloc[8]
rowM.plot()
rowB.plot()

It does not seem the correct way. Indeed, I am not able to choose the labels or the legends.

This the results: comparison between the 8th row of the first dataframe and the 8th row of the second dataframe

Someone could suggest me the correct way to compare the two data frames and plot some of the selected columns?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
diedro
  • 511
  • 1
  • 3
  • 15
  • you'll need `matplotlib.pyplot` in order to plot multiple data on the same graph – Mayeul sgc Oct 14 '19 at 15:38
  • Dear Mayeul sgc, you are suggesting to pass the rows selected to a matrix and to use the standatd matplotlib to plot them? thanks again – diedro Oct 14 '19 at 15:55
  • You don't need to do that, check [this](https://stackoverflow.com/a/22276109/4350650) answer – Mayeul sgc Oct 14 '19 at 16:00
  • Dear all, Dear Mayeul sgc, thanks, it seems to work. However, I have a strange behavior with x labels. In my opinion is due to data-frame data. Can I ask here? thanks – diedro Oct 14 '19 at 16:09
  • use `plt.xlabel()` or `plt.ylabel()` to adjust label to your needs – Mayeul sgc Oct 14 '19 at 16:19
  • Dear all, Dear @Mayeul sgc, the problem is that the xticx name become like the id name of the dataframe columns. My feeling is that "rowM = PRICES.iloc[8]" has two informations: value and column id. When I plot "rowM" with plt.plot(rowM), matplolib is not able to handle it and gives me strange results. What do you think? – diedro Oct 14 '19 at 16:54

1 Answers1

0

lets prepare some test data:

mtx1 = np.random.rand(10,8)*1.1+2
mtx2 = np.random.rand(10,8)+2

df1 = pd.DataFrame(mtx1)
df2 = pd.DataFrame(mtx2)

example output for df1:

Out[60]: 
              0         1         2         3
    0  2.604748  2.233979  2.575730  2.491230
    1  3.005079  2.984622  2.745642  2.082218
    2  2.577554  3.001736  2.560687  2.838092
    3  2.342114  2.435438  2.449978  2.984128
    4  2.416953  2.124780  2.476963  2.766410
    5  2.468492  2.662972  2.975939  3.026482
    6  2.738153  3.024694  2.916784  2.988288
    7  2.082538  3.030582  2.959201  2.438686
    8  2.917811  2.798586  2.648060  2.991314
    9  2.133571  2.162194  2.085843  2.927913

now let's plot it:

import matplotlib.pyplot as plt
%matplotlib inline

i = range(0,len(df1.loc[6,:]))   # from 0 to 3
plt.plot(i,df1.loc[6,:]) # take whole row 6
plt.plot(i,df2.loc[6,:]) # take whole row 6

result: enter image description here

Alex
  • 1,118
  • 7
  • 7