0

This is a follow-up question from what I asked here Merging a dataset based on ID and date (thanks for the helpful advice!)

I have two datasets each that are monthly summaries of different variables for households. I want to merge the two based on the household ID and the month.

df1 one looks like this:

     hh_ids      date total
     <chr>     <chr> <dbl>
1  KELDK13  2013-8-1     1
2  KMOMB02  2013-2-1     1
3  KMOMB02  2013-5-1     2
4  KMOMB04  2013-7-1     2
5  KMOMB04  2013-9-1     1
6  KMOMB06  2013-6-1     1
7  KMOMB14  2013-8-1     1
8  KMOMB16  2013-6-1     1
9  KMOMB17 2012-10-1     1
10 KMOMB17 2012-11-1     2

and the first 10 rows of df2 looks like:

  hh_ids      date    income consumption alcohol cleaning_materials  clothing
1  KELDK01 2012-11-1  62.70588    40.52941       0           0.000000  0.000000
2  KELDK01 2012-12-1  17.64706    42.43530       0           1.058824  7.058824
3  KELDK01 2013-01-1  91.76471    48.23529       0           0.000000  0.000000
4  KELDK01 2013-02-1  91.76470   107.52940       0           0.000000  0.000000
5  KELDK01 2013-03-1 116.47060   114.47060       0           0.000000  0.000000
6  KELDK01 2013-04-1 124.41180   118.29410       0           2.705882 17.647060
7  KELDK01 2013-05-1 137.23530   105.00000       0           1.411765  1.882353
8  KELDK01 2013-06-1 131.52940   109.54120       0           4.352942  2.941176
9  KELDK01 2013-07-1 121.52940   113.47060       0           2.352941 25.882350
10 KELDK01 2013-08-1 123.32940    86.50588       0           2.588235  2.941176

I want the "total" column from to be added on as a column in df2 with the matched hh_ids and date.

I have tried to do the following:

df3<-merge(df2,df1,by=c("hh_ids","date"),all=TRUE) 

but the resultant data.drame df3 now looks like this (i've chopped some columns to make it easier to display):

      hh_ids      date    income consumption total
1    KELDK01 2012-11-1  62.70588    40.52941     0
2    KELDK01 2012-12-1  17.64706    42.43530     0
3    KELDK01 2013-01-1  91.76471    48.23529     0
4    KELDK01 2013-02-1  91.76470   107.52940     0
5    KELDK01 2013-03-1 116.47060   114.47060     0
6    KELDK01 2013-04-1 124.41180   118.29410     0
7    KELDK01 2013-05-1 137.23530   105.00000     0
8    KELDK01 2013-06-1 131.52940   109.54120     0
9    KELDK01 2013-07-1 121.52940   113.47060     0
2595 KNBOM33     15918        NA          NA     1
2596 KNBOM33     15979        NA          NA     1
2597 KNBOM33 2012-10-1  32.94118    18.11765     0
2598 KNBOM33 2012-12-1  56.47059    44.23529     0
2599 KNBOM33 2013-01-1  10.58824    29.76471     0
2600 KNBOM33 2013-02-1  54.70588    70.21176     0

It seems that something has happened to df1's date in the merge, and therefore it didn't merge with the other values in df2. Does anyone have any suggestions as to what I should do? I have double checked that both df1 and df2 have the "date" column formatted as a date before the merge.

Community
  • 1
  • 1
  • Are you sure you want an outer join? I mean, if there is not a match between the two data frames, what's your expected result? Do you want to show a row even if there is not a match? If that's not your case, I would remove the `all=TRUE` in your `merge` – Jaime Caffarel Aug 04 '16 at 16:43
  • There should be a match for all rows in df1 in df2. I would like to keep all rows in the case there are any that don't get matched then I know there's an issue – Ashley Thomas Aug 04 '16 at 16:48

2 Answers2

1

It seems the type of "date", where "2013-8-1" is different from "2013-08-1".

before merge them:

df1$date = as.Date(df1$date);
df2$date = as.Date(df2$date);
Yong
  • 1,107
  • 1
  • 12
  • 21
  • That seems to have done it! I have no idea why as the directly previous line was mutate("date"=as.Date(paste(year,month,day,sep='-'))), but I won't ask questions and just be thankful it worked! – Ashley Thomas Aug 04 '16 at 20:01
0

I'd start off by checknig the rows with hh_ids == "KNB0M33" to see if there is anything funny going on in the dates, income and consumption columns.