I am trying to finish a business case in Kaggle but I am having some issues when merging two data frames.
Having dataframe "dailyActivity" as:
dailyActivity <-
structure(
list(
Id = c(1503960366, 1503960366, 1503960366),
ActivityDate = c("4/12/2016",
"4/13/2016", "4/14/2016"),
TotalSteps = c(13162L, 10735L, 10460L),
TotalDistance = c(8.5, 6.96999979019165, 6.73999977111816),
TrackerDistance = c(8.5, 6.96999979019165, 6.73999977111816),
LoggedActivitiesDistance = c(0, 0, 0),
VeryActiveDistance = c(1.87999999523163,
1.57000005245209, 2.44000005722046),
ModeratelyActiveDistance = c(0.550000011920929,
0.689999997615814, 0.400000005960464),
LightActiveDistance = c(6.05999994277954,
4.71000003814697, 3.91000008583069),
SedentaryActiveDistance = c(0,
0, 0),
VeryActiveMinutes = c(25L, 21L, 30L),
FairlyActiveMinutes = c(13L,
19L, 11L),
LightlyActiveMinutes = c(328L, 217L, 181L),
SedentaryMinutes = c(728L,
776L, 1218L),
Calories = c(1985L, 1797L, 1776L)
),
row.names = c(NA,
3L),
class = "data.frame"
)
And dataframe "sleepDay" as:
sleepDay <-
structure(
list(
Id = c(1503960366, 1503960366, 1503960366),
SleepDay = c("04/12/2016",
"04/13/2016", "04/15/2016"),
TotalSleepRecords = c(1L, 2L, 1L),
TotalMinutesAsleep = c(327L, 384L, 412L),
TotalTimeInBed = c(346L,
407L, 442L)
),
row.names = c(NA, 3L),
class = "data.frame"
)
Before trying to merge the dataframes I renamed both columns were the date is shown + some formatting on the sleepDay$SleepDay column:
names(dailyActivity)[2] <- 'Date'
sleepDay$SleepDay=as.POSIXct(sleepDay$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleepDay$SleepDay<- format(sleepDay$SleepDay, format = "%m/%d/20%y")
names(sleepDay)[2] <- 'Date'
I try to merge these two using the first two columns (Id and Date). I use the following code:
activityNsleep <- merge(x=dailyActivity, y=sleepDay, by = c("Id","Date"), all.x = TRUE)
The result of this merging is not what I expected. The columns are there but all the values show NA (check the right 3 columns): Merge result
Does anyone know what I am doing wrong? Could it be related to the date formatting?