-1

I have a categorical variable (in character structure) that is dummy coded in the following manner xx-xxxx. The first 2 digits of the dummy code are significant for categorizing the responses. I would like to be able to bin the responses according to these first 2 digits. For example, there are 28 responses dummy coded as 11-xxxx. I would like to combine all 28 of these responses into one response. I would, therefore, like to be able to convert the dummy coded categorical variable to a quantitative variable so I can more efficiently bin the responses according to these first 2 digits. Is there an R function for making this conversion?

Image of the Frequency Distribution of the first few responses for the variable

I am a beginner coder and this is my first time posting to stack overflow. Thank you for your help!

dput(data$H4LM18) Sample

  • If you're using the tidyverse, [this answer](https://stackoverflow.com/a/44424567/10898875) illustrates a neat way to make a new column from the first two digits of the dummy code. (The other answers to that question have other options you could explore.) – A. S. K. Dec 04 '19 at 20:31
  • Thanks for your input! I am looking to bin the responses so I can graph them so I'm afraid organizing them into columns won't be sufficient. – user12481858 Dec 04 '19 at 21:31
  • Do you want the final data frame to have one row per first two digits of the code (`11`, etc.)? Or are you looking for a column that encodes which "bin" the row goes in so that you can process the data frame more efficiently downstream? – A. S. K. Dec 04 '19 at 21:47
  • I would like the final data frame to have one row per first two digits of the code. – user12481858 Dec 05 '19 at 21:03
  • That's helpful. Can you post a sample of your data, using `dput`? – A. S. K. Dec 05 '19 at 23:29
  • I just added a picture of a sample of my data to the original post. – user12481858 Dec 06 '19 at 03:53
  • I doubt that blurring the distinction between categorical and numerical variables is a reliable way to group categorical variables. This sounds like an [XY problem](https://meta.stackexchange.com/q/66377/357835). – John Coleman Dec 06 '19 at 03:56

1 Answers1

0

I was able to receive help from a Help Desk and we successfully binned the variable according to the first two digits of the dummy code.

Here is code used for the dataset data and the variable H4LM18:

data$jobcategory<-data$H4LM18

data$jobbracket <- unlist(lapply((strsplit(data$jobcategory, "-")),function(x){x[1]}))#[c(T, F)]

By splitting the dummy code of the responses at the dash ("-") we were able to categorize the responses according to the first two digits of the dummy code alone.