Let us consider the following situation:
import pandas as pd
# Create dataframe with one column of type datatime64 and one float64
dictionary = {'MemStartDate':['2007-07-13', '2006-01-13', '2010-08-13'],
'TotalPrice':[50.5,10.4,3.5]}
df= pd.DataFrame(dictionary)
pd.to_datetime(df['MemStartDate']) #dtype: datetime64[ns]
df1 = pd.DataFrame(df, columns = ['MemStartDate'])
df4 = pd.DataFrame(df, columns = ['TotalPrice'])
df.TotalPrice # dtype: float64
Where you have df1 and df4 that are:
df1
Out:
MemStartDate
0 2007-07-13
1 2006-01-13
2 2010-08-13
df4
Out:
TotalPrice
0 50.5
1 10.4
2 3.5
If you want to concat df1 and df4, it means that you want to concatenate pandas objects along a particular axis with optional set logic along the other axes (see pandas.concat — pandas 1.4.2 documentation). Thus in practice:
df_concatenated = pd.concat([df1, df4], axis=1)
df_concatenated
The new resulting dataframe df_concatenated
is this:
Out:
MemStartDate TotalPrice
0 2007-07-13 50.5
1 2006-01-13 10.4
2 2010-08-13 3.5
The axis decides where you want to concatenate along. With axis=1
you have concatenated the second dataframe along columns of the first dataframe. You can try with axis=0
:
df_concatenated = pd.concat([df1, df4], axis=0)
df_concatenated
The output is:
Out:
MemStartDate TotalPrice
0 2007-07-13 NaN
1 2006-01-13 NaN
2 2010-08-13 NaN
0 NaN 50.5
1 NaN 10.4
2 NaN 3.5
Now you have added the second dataframe along rows of the first dataframe.
On the other hand, merge is used to join dataframes when they share some columns. It is useful because maybe you do not want to store dataframes with same contents repeatedly. For example:
# Create two dataframes
dictionary = {'MemStartDate':['2007-07-13', '2006-01-13', '2010-08-13'],
'TotalPrice':[50.5,10.4,3.5]}
dictionary_1 = {'MemStartDate':['2007-07-13', '2006-01-13', '2010-08-13', '2010-08-14'],
'Shop':['Shop_1','Shop_2','Shop_3','Shop_4']}
df= pd.DataFrame(dictionary)
df_1 = pd.DataFrame(dictionary_1)
if you have df and df_1 that are:
df
Out:
MemStartDate TotalPrice
0 2007-07-13 50.5
1 2006-01-13 10.4
2 2010-08-13 3.5
and
df_1
Out:
MemStartDate Shop
0 2007-07-13 Shop_1
1 2006-01-13 Shop_2
2 2010-08-13 Shop_3
3 2010-08-14 Shop_4
You can merge them in this way:
df_merged = pd.merge(df,df_1, on='MemStartDate', how='outer')
df_merged
Out:
MemStartDate TotalPrice Shop
0 2007-07-13 50.5 Shop_1
1 2006-01-13 10.4 Shop_2
2 2010-08-13 3.5 Shop_3
3 2010-08-14 NaN Shop_4
In the new dataframe df_merged
, you keep the common column of the old dataframes df
and df_1
(MemStartDate) and add the two columns that are different in the two dataframes (TotalPrice and Shop).
----> A couple of other explicative examples about merging dataframes in Pandas:
Example 1. Merging two dataframes preserving one column that is equal for both dataframes:
left = pd.DataFrame(
{
"key": ["K0", "K1", "K2", "K3"],
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
}
)
left
right = pd.DataFrame(
{
"key": ["K0", "K1", "K2", "K3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
}
)
right
result = pd.merge(left, right, on="key")
result
Out:
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
Example 2. Merging two dataframes in order to read all the combinations of values
df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
'value': [5, 6, 7, 8]})
result = pd.merge(df1,df2, left_on='lkey', right_on='rkey')
result
Out:
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
Also in this case you can check the pandas.DataFrame.merge — pandas 1.4.2 documentation (where I took the second example) and here you have other possible ways to manipulate your dataframes: Merge, join, concatenate and compare (where I took the first example).
In the end, to sum up, you can intuitively understand what pd.concat()
and pd.merge()
do by studying the meaning of their names in spoken language:
And to come back to your error:
Error: You are trying to merge on datetime64[ns] and float64 columns. If you wish to proceed you should use pd.concat
It is telling you that the common column of the two dataframes are of different data type. So he understands that you are trying to do something that is "pd.concat
's job" and so he suggests you to use pd.concat
.