Uniting two data frames

Question

I want to merge this two data frames (with no common columns) one next to each other. The two dataframes look like this:

df1:

10.74,5.71,5.41
11.44,6.1,5.87

df2:

10.17,6.58,5.23
9.99,5.75,5.13
11.21,6.35,5.72
10.3,5.86,5.12

I am trying with:

df_total=pd.concat([df1,df2],axis=1)

But the result looks something like this:

Access grade global,Grade_global,Regression_global,Access grade,Grade,Regression
,,,10.74,5.71,5.41
,,,11.44,6.1,5.87
10.17,6.58,5.23,,,
9.99,5.75,5.13,,,
11.21,6.35,5.72,,,
10.3,5.86,5.12,,,

And I what to have something like this:

10.17,6.58,5.23,10.74,5.71,5.41
9.99,5.75,5.13,11.44,6.1,5.87
11.21,6.35,5.72
10.3,5.86,5.12

The two things I want to know how to do are:

1- How can I merge the 2 data frames so that the values are next to each other (the number of rows should, therefore, be the maximum number of rows between the two dataframes; 4 in this case). 2- How to avoid having NaN (you can see that in the end there are multiple commas). (I want to avoid this because afterwards in the scatter plot I use, all the Nan are plotted as 0 (so I have a line of dots in y=0)).

The Nan values are generating zeros. Please see the result:

The html snipped is:

<div style="line-height:77%;"><br></div>
    <div id="grade_access_hs"></div>
    <div style="line-height:77%;"><br></div>
    <p>The lines that best approximate the expected grades according to the access grade to University and comparing all students with {{user.hsname}}' students are:</p>
    <div style="line-height:30%;"><br></div>
    <div id="equation3"></div>
    <div style="line-height:30%;"><br></div>
    <div id="equation4"></div>
    <script type="text/javascript" src="../static/scripts/grade_access_hs.js"></script>

All the chart:

  <script>
    'use strict';
var Grade_access_hs = c3.generate({
  bindto: '#grade_access_hs',
  data: {
    url: '../static/CSV/Chart_data/grades_access_hs.csv',
    xs: {
        Grade_global: 'Access grade global',
        Grade: 'Access grade',
        Regression_global: 'Access grade global',
        Regression: 'Access grade'
    },
    types: {
        Grade_global:'scatter',
        Grade:'scatter',
        Regression_global: 'line',
        Regression: 'line'
    },
  },
  axis: {
    y: {
      label: {
        text: "Average grade",
        position: "outer-middle"
      },
      min: 1,
      max: 9,
      tick: {outer: false}
    },
    x: {
      label: {
        text: "Access grade PAU",
        position: "outer-center"
      },
      min: 9,
      max: 14,
      tick: {
        outer: false,
        count:1,
        fit:false,
        values: [9,10,11,12,13,14]
      } 
    }
  },
  size: {
    height: 400,
    width: 800
  },
  zoom: {
    enabled: true
  },
  legend: {
    show: true,
    position: 'inset',
    inset: {
      anchor: 'top-right',
      x: 20,
      y: 20
    }
  },
})

d3.csv('../static/CSV/Chart_data/grades_access_hs.csv',function(data){
  var d1 = data[0];
  var d2 = data[1];

  var b = (1-(d2['Regression_global']/d1['Regression_global']))/((d1['Access grade global']-d2['Access grade global'])/d1['Regression_global'])
  var a = d1['Regression_global'] - (b * d1['Access grade global'])
  b = (Math.round(b*1000)/1000);
  a = (Math.round(a*1000)/1000);
  document.getElementById("equation3").innerHTML = "Global: Grade = " + a + "·x + " + b;

  var d = (1-(d2['Regression']/d1['Regression']))/((d1['Access grade private']-d2['Access grade private'])/d1['Regression'])
  var c = d1['Regression'] - (b * d1['Access grade'])
  d = (Math.round(d*1000)/1000);
  c = (Math.round(c*1000)/1000);
  document.getElementById("equation4").innerHTML = "Specific high school: Grade = " + c + "·x + " + d;
})
  </script>

With grades_acess_hs.csv:

Access grade global,Grade_global,Regression_global,Access grade,Grade,Regression
,,,10.74,5.71,5.41
,,,11.44,6.1,5.87
,,,11.21,6.35,5.72
,,,10.3,5.86,5.12
10.17,6.58,5.23,,,
9.99,5.75,5.13,,,
10.96,5.84,5.71,,,
9.93,6.12,5.09,,,
9.93,6.0,5.09,,,
11.21,6.22,5.86,,,
11.28,6.1,5.9,,,
,,,10.93,6.08,5.54

Thanks in advance!

https://stackoverflow.com/questions/49620538/what-are-the-levels-keys-and-names-arguments-for-in-pandas-concat-functio/49620539#49620539 — BENY, Jun 10 '18 at 19:16
Thanks @Wen but as I explained, concat is not working for me — MTT, Jun 10 '18 at 19:32
concat isn't working, I suspect, because the dataframes are different sizes. I suggest you get the size of the larger one, resize the smaller one and pad with nan values and you should be ok. — Andrew, Jun 10 '18 at 19:46

score 1 · Answer 1 · answered Jun 10 '18 at 20:32

1

I think you need a join and fillna

print(df2.join(df1).fillna(''))

10.17  6.58  5.23  10.74 5.71  5.41
 9.99  5.75  5.13  11.44  6.1  5.87
11.21  6.35  5.72                  
10.30  5.86  5.12

answered Jun 10 '18 at 20:32

user96564

1,578
5
24
42

dont forget to accept the answer, if it solves your problem. – user96564 Jun 11 '18 at 06:07
Thanks @user3280146 but when I plot it, I get zero values... Please see the image uploaded in the edited question. thanks! – MTT Jun 11 '18 at 18:24
@AdityaK any idea on this? Maybe plotting the two csv separatedly in the same axis (plotting one first, waiting, and then the other in the same axis). Thanks! – MTT Jun 11 '18 at 18:28
@MTT, can you show me your code how you are plotting – user96564 Jun 12 '18 at 11:11

score 0 · Answer 2 · answered Jun 10 '18 at 20:31

Without giving it too much thought:

if df1.shape[0] > df2.shape[0]:
    new_rows = df1.shape[0] - df2.shape[0]
    df3 = pd.DataFrame(np.zeros((new_rows, df2.shape[1])))
    df2 = df2.append(df3)
    new_df = pd.concat((df1, df2), axis=1)
#Alternative elif goes here doing the converse.

Uniting two data frames

2 Answers2