2

I have a dataframe that is read in with readRDS() as a df. This contains many rows with cities and states. I keep only data that is in the state of California as df_ca.

df_ca contains 100 columns and I only keep a few categorical columns. I create a new catagorical df called df_cat. I want to loop over the categorical columns and get the frequencies with the table function. Ignoring the loop for troubleshooting, I set var as city and execute the table function creating a new df called cat_freq. cat_freq contains all cities from df rather than df_ca, their Freq is 0. Why are they even showing up if they were filtered out? I am new to R but have a python background

df <- as.data.frame(readRDS('some.data.5140')) 
df_ca <- df[df$car.state == "ca",]
cat_col <- (unlist(list('color', 'city', 'deliver', 'type')))
df_cat <- df_ca[, cat_col]
var <- "city"
cat_freq <-  data.frame(table(df_cat[var]))
Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36
Thomas
  • 141
  • 1
  • 6

2 Answers2

2

Incorporating droplevels fixed the problem.

df <- as.data.frame(readRDS('some.data.5140')) 
df_ca <- df[df$car.state == "ca",]
cat_col <- (unlist(list('color', 'city', 'deliver', 'type')))
df_cat <- df_ca[, cat_col]
df_cat <- droplevels(df_cat)
var <- "city"
cat_freq <-  data.frame(table(df_cat[var]))
Thomas
  • 141
  • 1
  • 6
0

That is mostly because your columns are of type factor if you convert them to character it should help.

df_ca <- df[df$car.state == "ca",]
cat_col <- c('color', 'city', 'deliver', 'type')
df_cat <- df_ca[, cat_col]
#Convert all columns in df_cat to 
df_cat[] <- lapply(df_cat, as.character) character
var <- "city"
cat_freq <-  data.frame(table(df_cat[var]))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213