1

Below are two simple data frames. I would like to re-code (collapse) the Sat1 and Sat2 columns so that all degrees of satisfied are coded simply as Satisfied, and all degrees of Dissatisfied are coded as Dissatisfied. Neutral will remain as Neutral. These factors will therefore have three levels - Satisfied, Dissatisfied, and Neutral.

I would normally accomplish this by binding the data frames, and using lapply along with re-code from the car package, such as:

  DF1[2:3] <- lapply(DF1[2:3], recode, c('"Somewhat Satisfied"= "Satisfied","Satisfied"="Satisfied","Extremely Dissatisfied"="Dissatisfied"........etc, etc

I would like to accomplish this using map functions, specifically at_map (to maintain the data frame, but I'm new to purrr so feel free to suggest other versions of map) from purrr, as well as dplyr, tidyr,stringrandggplot2` so everything can be easily pipelined.

The example below is what I would like to accomplish, but for re-coding, but I was unable to make it work.

http://www.r-bloggers.com/using-purrr-with-dplyr/

I would like to use at_map or a similar map function so that I can keep the original columns of Sat1 and Sat2, so the re-coded columns will be added to the data frame and renamed. It would be great if this step could also be included within a function.

In reality, I will have many data frames, so I only want to recode the factor levels once, and then use a function from purrr to make the changes across all the data frames using the least amount of code.

Names<-c("James","Chris","Jessica","Tomoki","Anna","Gerald")
Sat1<-c("Satisfied","Very Satisfied","Dissatisfied","Somewhat Satisfied","Dissatisfied","Neutral")
Sat2<-c("Very Dissatisfied","Somewhat Satisfied","Neutral","Neutral","Satisfied","Satisfied")
Program<-c("A","B","A","C","B","D")
Pets<-c("Snake","Dog","Dog","Dog","Cat","None")

DF1<-data.frame(Names,Sat1,Sat2,Program,Pets)

Names<-c("Tim","John","Amy","Alberto","Desrahi","Francesca")
Sat1<-c("Extremely Satisfied","Satisfied","Satisfed","Somewhat Dissatisfied","Dissatisfied","Satisfied")
Sat2<-c("Dissatisfied","Somewhat Dissatisfied","Neutral","Extremely Dissatisfied","Somewhat Satisfied","Somewhat Dissatisfied")
Program<-c("A","B","A","C","B","D")


DF2<-data.frame(Names,Sat1,Sat2,Program)
Mike
  • 2,017
  • 6
  • 26
  • 53
  • Do you want all your data frames combined in the end or stored in a list separately or ...? This seems fairly straightforward with something like `mutate_each` from dplyr_0.4.3.9000 combined with `map` or `map_df`. `map_at` looks like it replaces the current variables, so may not be the tool to use in this case. – aosmith Jun 21 '16 at 22:56
  • Thanks for your response. I suppose it's ok if the data frames remain in a list separately. My main goal is to find a quick way (within the purrr dplyr pipeline) that will allow me to recode the factors across multiple data frames in one go. I like the idea of combining mutate_each and a map function. It's ok if the current variables are replaced since I can just make copies of the data frames first. So I would be grateful if you could show the code for your example. – Mike Jun 22 '16 at 02:12

2 Answers2

1

I do big recodings like this with a join, in this case I think transforming to a long dataframe makes the problem easier to think about.

library(tidyr)
library(dplyr)

mdf <- DF1 %>% 
  gather(var, value, starts_with("Sat"))

recode_df <- data_frame( value = c("Extremely Satisfied","Satisfied","Somewhat Dissatisfied","Dissatisfied"),
                         recode = 1:4)
mdf <- left_join(mdf, recode_df)
mdf %>% spread(var, recode)
Shorpy
  • 1,549
  • 13
  • 28
  • Thanks for your response. It gave me some ideas since I haven't thought about using this method to recode before. However I'm still hoping for an answer that uses a purrr map function. – Mike Jun 22 '16 at 02:14
  • Ah, the pattern there is probably to write a function `f` which recodes a single vector, then use `df[] <- map_at(df, c("SAT1", "SAT2"), f)` – Shorpy Jun 22 '16 at 12:52
  • Yeah, that's what I want to do. I guess now I need to figure out the function. I haven't had much success creating a function to collapse the factors like in my example above. Are you able to help me out with the code? – Mike Jun 22 '16 at 13:26
  • No time right now, but for a vector `v` do something like `v[v == "Very Satisfied"] <- "Satisfied"` for each recoding step. This is part of why I think joins are easier. – Shorpy Jun 22 '16 at 19:04
1

One way to do this is to use mutate_each to do the work combined with one of the map functions to go through a list of data.frames. Using mutate_each or equivalent from dplyr_0.4.3.9001 allows you to rename the new columns.

You could use string manipulation instead of recoding in this case. I believe you want to pull out Satisfied, Dissatisfied, or Neutral from the current strings that you have. You can achieve this with sub using regular expressions. For example,

sub(".*(Satisfied|Dissatisfied|Neutral).*$", "\\1", DF2$Sat2)
"Dissatisfied" "Dissatisfied" "Neutral"      "Dissatisfied" "Satisfied"    "Dissatisfied"

Package stringr has a nice function for extracting specific strings, str_extract.

library(stringr)
str_extract(DF2$Sat2, "Satisfied|Neutral|Dissatisfied")
 "Dissatisfied" "Dissatisfied" "Neutral"      "Dissatisfied" "Satisfied"    "Dissatisfied"

You can use this within mutate_each to use one of these functions on multiple columns. The name you give for the function within funs is what will be added on to the new columns names. I used recode. For one of your datasets:

DF1 %>% 
    mutate_each( funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied") ), 
              starts_with("Sat") )

    Names               Sat1               Sat2 Program  Pets  Sat1_recode  Sat2_recode
1   James          Satisfied  Very Dissatisfied       A Snake    Satisfied Dissatisfied
2   Chris     Very Satisfied Somewhat Satisfied       B   Dog    Satisfied    Satisfied
3 Jessica       Dissatisfied            Neutral       A   Dog Dissatisfied      Neutral
4  Tomoki Somewhat Satisfied            Neutral       C   Dog    Satisfied      Neutral
5    Anna       Dissatisfied          Satisfied       B   Cat Dissatisfied    Satisfied
6  Gerald            Neutral          Satisfied       D  None      Neutral    Satisfied

To go through many datasets stored in a list, you can use a map function from purrr to perform a function on every element in the list.

list(DF1, DF2) %>%
    map(~mutate_each(.x, 
                  funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied") ), 
                  starts_with("Sat")) )

[[1]]
    Names               Sat1               Sat2 Program  Pets  Sat1_recode  Sat2_recode
1   James          Satisfied  Very Dissatisfied       A Snake    Satisfied Dissatisfied
2   Chris     Very Satisfied Somewhat Satisfied       B   Dog    Satisfied    Satisfied
...
[[2]]
      Names                  Sat1                   Sat2 Program  Sat1_recode  Sat2_recode
1       Tim   Extremely Satisfied           Dissatisfied       A    Satisfied Dissatisfied
2      John             Satisfied  Somewhat Dissatisfied       B    Satisfied Dissatisfied
...

Using map_df instead will bind all of the elements in your list into a data.frame, which may or may not be what you want. Using the .id argument adds a name for each original dataset.

list(DF1, DF2) %>%
    map_df(~mutate_each(.x, 
                  funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied")), 
                  starts_with("Sat")), .id = "Group")

   Group     Names                  Sat1                   Sat2 Program  Pets  Sat1_recode
1      1     James             Satisfied      Very Dissatisfied       A Snake    Satisfied
2      1     Chris        Very Satisfied     Somewhat Satisfied       B   Dog    Satisfied
3      1   Jessica          Dissatisfied                Neutral       A   Dog Dissatisfied
4      1    Tomoki    Somewhat Satisfied                Neutral       C   Dog    Satisfied
5      1      Anna          Dissatisfied              Satisfied       B   Cat Dissatisfied
6      1    Gerald               Neutral              Satisfied       D  None      Neutral
7      2       Tim   Extremely Satisfied           Dissatisfied       A  <NA>    Satisfied
8      2      John             Satisfied  Somewhat Dissatisfied       B  <NA>    Satisfied
...
aosmith
  • 34,856
  • 9
  • 84
  • 118
  • Thanks, this is exactly what I was looking for! – Mike Jun 23 '16 at 01:33
  • list(DF1,DF2)%>%map(~mutate(.,SatREC=Sat1 %>% recode('"Extremely Satisfied"="Satisfied"'))) – Mike Jun 23 '16 at 01:52
  • Quick question, using your answer above, I also played around with using the car recode function with map and mutate, which works. The code is in the comment above. But, how would I write the same code with mutate_each, and include both Sat1 and Sat2? – Mike Jun 23 '16 at 01:53
  • As in the code in the answer, you can work on all variables that start with "Sat" using the `starts_with` function. If you put your `recode` code in place of the `str_extract` code it should work fine. Note the `.` in `mutate_each` represents the variable that you are mutating so you will no longer hard code the variable names like you did in your `mutate` code. – aosmith Jun 23 '16 at 14:53