0

How To Merge/Concat Two Data Frames

I want to merge two dataframes: the first one is a dataframe with one column with datetime64 dtype and the second one is a float dtype one column dataframe. This is what I have tried:

df1 = pd.DataFrame(df, columns = ['MemStartDate'])
df4 = pd.DataFrame(df, columns = ['TotalPrice'])

df_merge = pd.merge(df1,df2,left_on='MemStartDate',right_on='TotalPrice')

Error: You are trying to merge on datetime64[ns] and float64 columns. If you wish to proceed you should use pd.concat

But how can I do that ?

Student.py
  • 225
  • 3
  • 14
Coder
  • 37
  • 1
  • 8

3 Answers3

0

you can try this.

df_merge = pd.concat([df1, df2], axis=1)
greco
  • 304
  • 4
  • 11
  • can you explain how you did this? – Coder Apr 28 '22 at 07:16
  • @VedantRawal [pd.concat()](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) 'adds' a list of dataframes together and you can concat multiple dataframes with pd.concat([df1, df2, df3, ...]). The `axis` argument defines along which axis the dataframes are concatenated together. – greco Apr 28 '22 at 07:21
0

Best option to use pd.concat but you also can try dataframe.join(dataframe). for more information try to go through this Merge, join, concatenate and compare

df_merge=df1.join(df2)
0

Let us consider the following situation:

import pandas as pd

# Create dataframe with one column of type datatime64 and one float64

dictionary = {'MemStartDate':['2007-07-13', '2006-01-13', '2010-08-13'], 
              'TotalPrice':[50.5,10.4,3.5]}
df= pd.DataFrame(dictionary)
pd.to_datetime(df['MemStartDate']) #dtype: datetime64[ns]
df1 = pd.DataFrame(df, columns = ['MemStartDate'])
df4 = pd.DataFrame(df, columns = ['TotalPrice'])
df.TotalPrice # dtype: float64

Where you have df1 and df4 that are:

df1

Out: 
  MemStartDate
0   2007-07-13
1   2006-01-13
2   2010-08-13

df4

Out: 
   TotalPrice
0        50.5
1        10.4
2         3.5

If you want to concat df1 and df4, it means that you want to concatenate pandas objects along a particular axis with optional set logic along the other axes (see pandas.concat — pandas 1.4.2 documentation). Thus in practice:

df_concatenated = pd.concat([df1, df4], axis=1)
df_concatenated

The new resulting dataframe df_concatenated is this:

Out: 
  MemStartDate  TotalPrice
0   2007-07-13        50.5
1   2006-01-13        10.4
2   2010-08-13         3.5

The axis decides where you want to concatenate along. With axis=1 you have concatenated the second dataframe along columns of the first dataframe. You can try with axis=0:

df_concatenated = pd.concat([df1, df4], axis=0)
df_concatenated

The output is:

Out: 
  MemStartDate  TotalPrice
0   2007-07-13         NaN
1   2006-01-13         NaN
2   2010-08-13         NaN
0          NaN        50.5
1          NaN        10.4
2          NaN         3.5

Now you have added the second dataframe along rows of the first dataframe.

On the other hand, merge is used to join dataframes when they share some columns. It is useful because maybe you do not want to store dataframes with same contents repeatedly. For example:

# Create two dataframes
dictionary = {'MemStartDate':['2007-07-13', '2006-01-13', '2010-08-13'], 
              'TotalPrice':[50.5,10.4,3.5]}
dictionary_1 = {'MemStartDate':['2007-07-13', '2006-01-13', '2010-08-13', '2010-08-14'], 
                'Shop':['Shop_1','Shop_2','Shop_3','Shop_4']}
df= pd.DataFrame(dictionary)
df_1 = pd.DataFrame(dictionary_1)

if you have df and df_1 that are:

df

Out: 
  MemStartDate  TotalPrice
0   2007-07-13        50.5
1   2006-01-13        10.4
2   2010-08-13         3.5

and

df_1

Out: 
  MemStartDate    Shop
0   2007-07-13  Shop_1
1   2006-01-13  Shop_2
2   2010-08-13  Shop_3
3   2010-08-14  Shop_4

You can merge them in this way:

df_merged = pd.merge(df,df_1, on='MemStartDate', how='outer')

df_merged

Out: 
  MemStartDate  TotalPrice    Shop
0   2007-07-13        50.5  Shop_1
1   2006-01-13        10.4  Shop_2
2   2010-08-13         3.5  Shop_3
3   2010-08-14         NaN  Shop_4

In the new dataframe df_merged, you keep the common column of the old dataframes df and df_1 (MemStartDate) and add the two columns that are different in the two dataframes (TotalPrice and Shop).

----> A couple of other explicative examples about merging dataframes in Pandas:

Example 1. Merging two dataframes preserving one column that is equal for both dataframes:

left = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K3"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    }
)
left

right = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)
right

result = pd.merge(left, right, on="key")

result

Out: 
  key   A   B   C   D
0  K0  A0  B0  C0  D0
1  K1  A1  B1  C1  D1
2  K2  A2  B2  C2  D2
3  K3  A3  B3  C3  D3

Example 2. Merging two dataframes in order to read all the combinations of values

df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [5, 6, 7, 8]})

result = pd.merge(df1,df2, left_on='lkey', right_on='rkey')

result

Out: 
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  foo        5  foo        5
3  foo        5  foo        8
4  bar        2  bar        6
5  baz        3  baz        7

Also in this case you can check the pandas.DataFrame.merge — pandas 1.4.2 documentation (where I took the second example) and here you have other possible ways to manipulate your dataframes: Merge, join, concatenate and compare (where I took the first example).

In the end, to sum up, you can intuitively understand what pd.concat() and pd.merge() do by studying the meaning of their names in spoken language:

And to come back to your error:

Error: You are trying to merge on datetime64[ns] and float64 columns. If you wish to proceed you should use pd.concat

It is telling you that the common column of the two dataframes are of different data type. So he understands that you are trying to do something that is "pd.concat's job" and so he suggests you to use pd.concat.

Student.py
  • 225
  • 3
  • 14