I try to implement SQL query using sqldf and dplyr.
I need to do this separately using these 2 different libraries.
Unfortunately, I cannot produce the same result using dplyr.
library(sqldf)
library(dplyr)
Id <- c(1,2,3,4)
HasPet <- c(0,0,1,1)
Age <- c(20,1,14,10)
Posts <- data.frame(Id, HasPet, Age)
# sqldf way
ref <- sqldf("
SELECT Id, HasPet, MAX(Age) AS MaxAge
FROM Posts
GROUP BY HasPet
")
# dplyr way
res <- Posts %>%
group_by(HasPet) %>%
summarize(
Id,
HasPet,
MaxAge = max(Age)
) %>%
select(Id, HasPet, MaxAge)
head(ref)
head(res)
Output for sqldf is:
> head(ref)
Id HasPet MaxAge
1 1 0 20
2 3 1 14
while the output for sqldf is different:
> head(res)
# A tibble: 4 x 3
# Groups: HasPet [2]
Id HasPet MaxAge
<dbl> <dbl> <dbl>
1 1 0 20
2 2 0 20
3 3 1 14
4 4 1 14
UPD. SQL query cannot be modified.