My question relates to
library(haven)
library(labelled)
library(sjlabelled)
What I am trying to do is clean up some labelled data from SPSS prior to conversion for factors so I can run regressions that make sense. This means getting rid of those small catch-all categories which dont really help much.
The steps are Step One ) replacing NA with 0 and labelling it "missing" Step Two ) finding the value of "Other", finding all instances and recoding them to zero Step Three) sorting all the labels by value and dropping "Other" as unused.
tdf2 <- as.data.frame(haven::read_sav(file.choose())
test2 <- tdf2[, 'AgeGender']
Thats how I actually get the data that for reproducability should look like
set.seed(123)
test2 <- sample(1:15, size = 3000, replace = TRUE)
add_labels(test2, labels = c("female 18-24" = 1, "female 25-34" =2, etc see below up to 15)
changetoNA <- which(test2 %in% sample(test2, 15))
test2[changetoNA] <- NA
# STEP ONE
test2[is.na(test2)] <- 0
val_label(test2,0) <- "missing"
# STEP TWO
z <- stack(attr(test2,"labels") # create a df of labels and values
y <- which(z == "Other", arr.ind = TRUE)[1] # look up the row index of the subset of Other
test2[test2 == y] <- 0 # change the values of that row index to zero
attributes(test2)$class # now take a look at the class
z # and the table z
$class haven_labelled vctrs-vctr double
1 female 18-24 2 female 25-34 3 female 35-44 4 female 45-54 5 female 55-64 6 female 65-74 7 female 75+ 8 male 18-24 9 male 25-34 10 male 35-44 11 male 45-54 12 male 55-64 13 male 65-74 14 male 75+ 15 Other 0 missing
So what I want to do is sort the value labels so that missing takes its rightful place as the first in the list and drop "other" altogether.
# STEP THREE
drop_unused_value_labels(test2)
sort_val_labels(test2, according_to = "values")
does nothing