0

I am trying to finish a business case in Kaggle but I am having some issues when merging two data frames.

Having dataframe "dailyActivity" as:

dailyActivity <-
structure(
  list(
    Id = c(1503960366, 1503960366, 1503960366),
    ActivityDate = c("4/12/2016",
                     "4/13/2016", "4/14/2016"),
    TotalSteps = c(13162L, 10735L, 10460L),
    TotalDistance = c(8.5, 6.96999979019165, 6.73999977111816),
    TrackerDistance = c(8.5, 6.96999979019165, 6.73999977111816),
    LoggedActivitiesDistance = c(0, 0, 0),
    VeryActiveDistance = c(1.87999999523163,
                           1.57000005245209, 2.44000005722046),
    ModeratelyActiveDistance = c(0.550000011920929,
                                 0.689999997615814, 0.400000005960464),
    LightActiveDistance = c(6.05999994277954,
                            4.71000003814697, 3.91000008583069),
    SedentaryActiveDistance = c(0,
                                0, 0),
    VeryActiveMinutes = c(25L, 21L, 30L),
    FairlyActiveMinutes = c(13L,
                            19L, 11L),
    LightlyActiveMinutes = c(328L, 217L, 181L),
    SedentaryMinutes = c(728L,
                         776L, 1218L),
    Calories = c(1985L, 1797L, 1776L)
  ),
  row.names = c(NA,
                3L),
  class = "data.frame"
)

And dataframe "sleepDay" as:

sleepDay <-
structure(
  list(
    Id = c(1503960366, 1503960366, 1503960366),
    SleepDay = c("04/12/2016",
                 "04/13/2016", "04/15/2016"),
    TotalSleepRecords = c(1L, 2L, 1L),
    TotalMinutesAsleep = c(327L, 384L, 412L),
    TotalTimeInBed = c(346L,
                       407L, 442L)
  ),
  row.names = c(NA, 3L),
  class = "data.frame"
)

Before trying to merge the dataframes I renamed both columns were the date is shown + some formatting on the sleepDay$SleepDay column:

names(dailyActivity)[2] <- 'Date'
sleepDay$SleepDay=as.POSIXct(sleepDay$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleepDay$SleepDay<- format(sleepDay$SleepDay, format = "%m/%d/20%y")
names(sleepDay)[2] <- 'Date'

I try to merge these two using the first two columns (Id and Date). I use the following code:

activityNsleep <- merge(x=dailyActivity, y=sleepDay, by = c("Id","Date"), all.x = TRUE)

The result of this merging is not what I expected. The columns are there but all the values show NA (check the right 3 columns): Merge result

Does anyone know what I am doing wrong? Could it be related to the date formatting?

Inigo_eh
  • 11
  • 2
  • First off, the column names that you want to merge by should always be the same. Can you please post a head(dailyActivity) and head(sleepDay) by editing the question? – Anurag N. Sharma Apr 29 '22 at 00:35
  • Hi Inigo_eh, welcome to stack overflow. Please provide a reproducible example (see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ) providing your data as screenshoots means no one can try out your problem. Also it seems your screenshots don't line up (Date column isn't in both so your data.frames so function would have failed, output columns aren't in input). Often just by creating reproducible example you might find the bug. – Sarah Apr 29 '22 at 00:36
  • Try: `merge(x=dailyActivity, y=sleepDay, by.x = c("Id","ActivityDate"), by.y = c("Id","Date"), all.x = TRUE) ` – jlhoward Apr 29 '22 at 02:05
  • @Sarah Thanks a lot for your help, hopefully the way I edited the my question is easier to understand. – Inigo_eh Apr 29 '22 at 15:10
  • @jlhoward I forgot to mention that I changed the column names. I eliminated this operation and tried your code, I get the same result unfortunately. – Inigo_eh Apr 29 '22 at 15:11

1 Answers1

1

Ok after fighting back and forth I found out the issue is caused by the different date format in the dataframes. The ActivityDate column has no zeros before the month:

ActivityDate = c("4/12/2016",
                     "4/13/2016", "4/14/2016")

SleepDay = c("04/12/2016",
                 "04/13/2016", "04/15/2016")

Formatting both columns as date fixes the issue:

dailyActivity$ActivityDate<- as.Date(dailyActivity$ActivityDate , format = "%m/%d/%y")
sleepDay$SleepDay<- as.Date(sleepDay$SleepDay , format = "%m/%d/%y")

Thanks a lot for the help!

Inigo_eh
  • 11
  • 2