I have a dataframe that is read in with readRDS()
as a df. This contains many rows with cities and states. I keep only data that is in the state of California as df_ca
.
df_ca
contains 100 columns and I only keep a few categorical columns. I create a new catagorical df called df_cat
. I want to loop over the categorical columns and get the frequencies with the table function. Ignoring the loop for troubleshooting, I set var as city
and execute the table function creating a new df called cat_freq
. cat_freq
contains all cities from df
rather than df_ca
, their Freq is 0. Why are they even showing up if they were filtered out? I am new to R but have a python background
df <- as.data.frame(readRDS('some.data.5140'))
df_ca <- df[df$car.state == "ca",]
cat_col <- (unlist(list('color', 'city', 'deliver', 'type')))
df_cat <- df_ca[, cat_col]
var <- "city"
cat_freq <- data.frame(table(df_cat[var]))