I am trying to count the number of distinct names from a dataset in R using the sqldf package, and wanted to check my answer using tidy. I got a slightly different answer, and couldn't figure out what caused it. Here's my code:
mayors <- read_csv(file="https://raw.githubusercontent.com/jmontgomery/jmontgomery.github.io/master/PDS/Datasets/Mayors.csv")
mayorsDF <- as.data.frame(mayors)
library(sqldf)
sqldf("select count(distinct FullName) from mayorsDF") # gives me 1406
allNamesDF <- sqldf("select distinct FullName from mayorsDF")
length(allNamesDF$FullName) # gives me 1407
library(tidyverse)
mayors %>%
select("FullName") %>%
unique() %>%
count() # gives me 1407
What am I missing? I'm new to the sqldf package, but not new to SQL.