4

I'm learning Python(2.7) and trying to left join two pandas dataframes. One dataframe has date and corresponding sales of a product, while the other has date and corresponding day of the week.

print type(weekdaytrain)
print weekdaytrainhead(5)

<class 'pandas.core.frame.DataFrame'>
         data  giorno_settimana
0  2014-09-01                 0
1  2014-09-02                 1
2  2014-09-03                 2
3  2014-09-04                 3
4  2014-09-05                 4

print type(train)
print train.head(5)

<class 'pandas.core.frame.DataFrame'>
        data     pezzi
1078 2014-09-01   1743
1086 2014-09-02   1483
1094 2014-09-03   1510
1102 2014-09-04   1276
1110 2014-09-05   1741

When I do this:

new_train = pd.merge(train,weekdaytrain, on='data',how='left')

or

new_train = pd.merge(train,weekdaytrain, left_on='data',right_on='data',how='left') 

I get:

        data  pezzi  giorno_settimana
0 2014-09-01   1743               NaN
1 2014-09-02   1483               NaN
2 2014-09-03   1510               NaN
3 2014-09-04   1276               NaN
4 2014-09-05   1741               NaN

Even if the dates do correspond. I searched for answers but nothing suits my problem, can you help me?

Thanks!

Tommaso Guerrini
  • 1,499
  • 5
  • 17
  • 33

1 Answers1

1

I think you need convert column to datetime in both Dataframes, because it seems there are different dtypes - one is datetime and one is object (obviously string):

weekdaytrain.data = pd.to_datetime(weekdaytrain.data)
train.data = pd.to_datetime(train.data)

print (weekdaytrain.dtypes)
data                datetime64[ns]
giorno_settimana             int64
dtype: object

print (train.dtypes)
data     object
pezzi     int64
dtype: object

new_train = pd.merge(train,weekdaytrain, on='data',how='left')
print (new_train)
         data  pezzi  giorno_settimana
0  2014-09-01   1743               NaN
1  2014-09-02   1483               NaN
2  2014-09-03   1510               NaN
3  2014-09-04   1276               NaN
4  2014-09-05   1741               NaN

#column in train is not datetime, so need converting
train.data = pd.to_datetime(train.data)
new_train = pd.merge(train,weekdaytrain, on='data',how='left')
print (new_train)
        data  pezzi  giorno_settimana
0 2014-09-01   1743                 0
1 2014-09-02   1483                 1
2 2014-09-03   1510                 2
3 2014-09-04   1276                 3
4 2014-09-05   1741                 4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252