0

I have a set of categorical columns in my dataset that I'll be turning into binary variables (1/0).

There are many of these, and currently, I've called the column names and values, transferred into a word document and then used the column values directly in the code:

binarydata<- rawdata3
my_cols = c(8:38, 48:52, 59:69, 96:118, 120:132, 145:148, 154:170, 223:330) 
binarydata[my_cols] <- as.integer(!is.na(binarydata[my_cols]))

Is there a way to do it using the variable names, instead of the values?

Any help appreciated,

Pre
  • 111
  • 7
  • It's hard to give a useful answer without a [representative sample](https://stackoverflow.com/q/5963269/5325862) of data. I imagine there are already posts on SO that should answer your question, but it's hard to point you to them without actually knowing what's in your data – camille Nov 23 '20 at 00:24
  • Not exactly sure if I understand your question. You already have `my_cols` as numbers now you want their equivalent column names and use it in the code? Why? – Ronak Shah Nov 23 '20 at 03:33
  • in case the column numbers change. I don't want to go through the process of finding the column numbers again – Pre Nov 23 '20 at 22:40

2 Answers2

1

We can use colnames to subset. colnames is more general compared to names as it can also work with matrix

nm1 <- colnames(binarydata)[my_cols]
binarydata[nm1] <- lapply(binarydata[nm1], function(x) +(!is.na(x)))

Also, using the dplyr, we can specify the column names in range (:)

library(dplyr)
mtcars1 <- mtcars %>% 
      mutate(across(c(mpg:disp, wt:qsec), ~ +(!is.na(.))))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • what does this do - does it change to 1 where a value is present, and 0 where value is not? – Pre Nov 23 '20 at 22:42
  • @Pre Yes, the `!` converts TRUE -> FALSE and viceversa. The `+` coerces TRUE ->1 and FALSE -> 0 – akrun Nov 23 '20 at 22:43
0

This can also work, but it was not tested as no data was shared:

#Code
binarydata<- rawdata3
my_cols = c(8:38, 48:52, 59:69, 96:118, 120:132, 145:148, 154:170, 223:330) 
mynames <- names(binarydata)[my_cols]
binarydata[,mynames] <- as.integer(!is.na(binarydata[,mynames]))
Duck
  • 39,058
  • 13
  • 42
  • 84