0

I have multiple dataframes as below.

import pandas as pd
import numpy as np
dfA={"1":np.random.rand(3)}
dfA=pd.DataFrame(dfA)
dfB={"2":np.random.rand(5)}
dfB=pd.DataFrame(dfB)
dfC={"3":np.random.rand(6)}
dfC=pd.DataFrame(dfC)

and want to combine them like below.

dfABC=pd.concat([dfA,dfB,dfC], join="outer")
print (dfABC)

For example, dfABC would be like this. (I am not sure why "print" does not work. So let me attach the figure.)

enter image description here

The expected output will be like this.

enter image description here

Then I want to make a scatter plot for this table. X-axis values are 1,2,3. Y-axis values are dfA(when x=1), dfB(when x=2), and dfC(when x=3). Is there a function to make scatterplot from this table? I googled it but could not find it.

Also, it it possible to add trendline? In the Excel, this scatterplot is easy but I want to use python because actual dataset is quite large.

Thank you very much for your help.

Tom_Hanks
  • 517
  • 1
  • 6
  • 15
  • Not clear on the expected output... do you want 3 subplots for each dataframe? – cs95 Dec 17 '18 at 04:29
  • Thanks for the comment. Please see the attached figure. – Tom_Hanks Dec 17 '18 at 04:58
  • 2
    First, concatenate: `dfABC = pd.concat([dfA, dfB, dfC], join="outer", sort=True, axis=1)`. Then, melt and plot: `dfABC.melt().astype(float).plot.scatter('variable', 'value')` you can take it from there. – cs95 Dec 17 '18 at 05:02
  • It worked. Thanks a lot! Do you happen to know how to add the trendline? – Tom_Hanks Dec 17 '18 at 05:16
  • Perhaps this might be useful to you: https://stackoverflow.com/questions/26447191/how-to-add-trendline-in-python-matplotlib-dot-scatter-graphs – cs95 Dec 17 '18 at 05:16

1 Answers1

0

if i understand, the following could be a good solution for you:

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

dfA={"1":np.random.rand(3)}
dfA=pd.DataFrame(dfA)
dfB={"2":np.random.rand(5)}
dfB=pd.DataFrame(dfB)
dfC={"3":np.random.rand(6)}
dfC=pd.DataFrame(dfC)

dfABC=pd.concat([dfA,dfB,dfC], join="outer")
print (dfABC)

for y in dfABC.columns:
    plt.scatter(dfABC[y].values, dfABC[y])
plt.show()
Wael Almadhoun
  • 401
  • 4
  • 7