1

on my survey I made a mistake for a 5 point likert scale as follows:

dput(head(edu_data))
structure(list(Education.1. = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("", "Y"), class = "factor"), Education.2. = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("", "Y"), class = "factor"), 
Education.3. = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", 
"Y"), class = "factor"), Education.4. = structure(c(1L, 1L, 
1L, 2L, 2L, 1L), .Label = c("", "Y"), class = "factor"), 
Education.5. = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("", 
"Y"), class = "factor")), row.names = c(NA, 6L), class = "data.frame")

I would like to change this into one column with a single value such that answer_to_ls= 1:5

The output I want to get would be a column with a single number and that means getting rid of the letter. I do off course have a unique respondent's ID

Please tell me if I can somehow be more clear in the style of my question as I want to be a valuable member of the comunity.

  • 2
    I don't understand what the question is. Could you clarify by providing a [minimal, reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) and expected output (exactly what you want, not a description of what you want)? – csgroen Sep 08 '20 at 07:48
  • I do hope my edit makes it more clear. otherwise please tell me so – Daniel Ortiz Sep 08 '20 at 08:06
  • not clear to me. Can you explain how your data look like by providing an example of what you have and what you expect to have at the end? provide the result of `dput(head(YOUR_DATA))` – Edo Sep 08 '20 at 08:08
  • I think you may be looking for `plyr::mapvalues()`, but I'm not sure without a clearer example – csgroen Sep 08 '20 at 08:14
  • I am sorry for my lack of experience. Is my edit clearer? – Daniel Ortiz Sep 08 '20 at 08:24
  • I think you have included only 5 columns in your `dput` when there are more columns in your actual data. Also show expected output for the data shared so that it is easier to understand how you want your final output to look like. To explain your problem clearly you can also include more rows into `dput` and share `dput(head(edu_data, 20))` for 20 rows. – Ronak Shah Sep 08 '20 at 08:42

2 Answers2

1

I think there are a lot of potential solutions available, try a search of merging or collapsing multiple binary or dichotomous columns into a single column. For example:

R - Convert various dummy/logical variables into a single categorical variable/factor from their name

In your case, you could try something like:

edu_data$answer_to_ls <- apply(edu_data[1:5] == "Y", 1, function(x) { if (any(x)) { as.numeric(gsub(".*(\\d+).", "\\1", names(which(x)))) } else NA })

This will extract the number from the column name for the Likert scale response 1 to 5, make it a numeric value, and include NA if there are no "Y" responses. edu_data[1:5] selects those columns to consider for conversion, in this case columns 1 through 5.

  Education.1. Education.2. Education.3. Education.4. Education.5. answer_to_ls
1                                                                Y            5
2                                                                Y            5
3                                                                Y            5
4                                                   Y                         4
5                                                   Y                         4
6                                                                            NA
Ben
  • 28,684
  • 5
  • 23
  • 45
0
d <- structure(list(Education.1. = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Y"), class = "factor"), 
               Education.2. = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Y"), class = "factor"),
               Education.3. = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Y"), class = "factor"), 
               Education.4. = structure(c(1L, 1L, 1L, 2L, 2L, 1L), .Label = c("", "Y"), class = "factor"), 
               Education.5. = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("", "Y"), class = "factor")), 
               row.names = c(NA, 6L), class = "data.frame")

d$item1 <- 1 * (d$Education.1 == "Y") +
           2 * (d$Education.2 == "Y") +
           3 * (d$Education.3 == "Y") +
           4 * (d$Education.4 == "Y") +
           5 * (d$Education.5 == "Y") 

print(d)

leads to

> print(d)
  Education.1. Education.2. Education.3. Education.4. Education.5. item1
1                                                                Y     5
2                                                                Y     5
3                                                                Y     5
4                                                   Y                  4
5                                                   Y                  4
6                                                                      0
Bernhard
  • 4,272
  • 1
  • 13
  • 23