1

I have a code where I see which people work in certain groups. When I ask the leader of each group to present those who work for them, in a survey, I get a row of all of the team members. What I need is to clean the data into multiple rows with their group information.

This is what my data frame looks like,

LeaderName <- c('John','Jane','Louis','Carl')

Group <- c('3','1','4','2')

Member1 <- c('Lucy','Stephanie','Chris','Leslie')

Member1ID <- c('1','2','3','4')

Member2 <- c('Earl','Carlos','Devon','Francis')

Member2ID <- c('5','6','7','8')

Member3 <- c('Luther','Peter','','Severus')

Member3ID <- c('9','10','','11')

GroupInfo <- data.frame(LeaderName, Group, Member1, Member1ID, Member2 ,Member2ID, Member3, Member3ID)

This is what I would like it to show with a certain code

LeaderName_ <- c('John','Jane','Louis','Carl','John','Jane','Louis','Carl','John','Jane','','Carl')

Group_ <- c('3','1','4','2','3','1','4','2','3','1','','2')

Member <- c('Lucy','Stephanie','Chris','Leslie','Earl','Carlos','Devon','Francis','Luther','Peter','','Severus')

MemberID <- c('1','2','3','4','5','6','7','8','9','10','','11')

ActualGroupInfor <- data.frame(LeaderName_,Group_,Member,MemberID)
M--
  • 25,431
  • 8
  • 61
  • 93
LordVoldemort
  • 107
  • 1
  • 7
  • ```reshape(data = GroupInfo, idvar = c("LeaderName", "Group"), varying = list(Member=c(3,5,7),MemberID=c(4,6,8)), direction="long", v.names = c("Member","MemberID"), sep="_")[,-3]``` – M-- May 23 '19 at 15:45
  • @akrun I was trying to use `stack` and all I came up with is: ```cbind(GroupInfo[,c("LeaderName", "Group")], stack(lapply(GroupInfo[,c(3,5,7)], as.character)), stack(lapply(GroupInfo[,c(4,6,8)], as.character)))``` any idea? – M-- May 23 '19 at 16:08
  • 1
    @M-M sorry, got busy. If you meant `do.call(cbind, lapply(list(GroupInfo[, c(3, 5, 7)], GroupInfo[,c(4, 6, 8)]), function(x) stack(lapply(x, as.character))))` – akrun May 23 '19 at 16:39

2 Answers2

1

An option would be melt from data.table and specify the column name patterns in the measure parameter

library(data.table)
melt(setDT(GroupInfo), measure = patterns("^Member\\d+$", 
    "^Member\\d+ID$"), value.name = c("Member", "MemberID"))[, variable := NULL][]
#    LeaderName Group    Member MemberID
# 1:       John     3      Lucy        1
# 2:       Jane     1 Stephanie        2
# 3:      Louis     4     Chris        3
# 4:       Carl     2    Leslie        4
# 5:       John     3      Earl        5
# 6:       Jane     1    Carlos        6
# 7:      Louis     4     Devon        7
# 8:       Carl     2   Francis        8
# 9:       John     3    Luther        9
#10:       Jane     1     Peter       10
#11:      Louis     4                   
#12:       Carl     2   Severus       11
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here is a solution in base r:

reshape(
 data=GroupInfo, 
 idvar=c("LeaderName", "Group"),
 varying=list(
  Member=which(names(GroupInfo) %in% grep("^Member[0-9]$",names(GroupInfo),value=TRUE)),
  MemberID=which(names(GroupInfo) %in% grep("^Member[0-9]ID",names(GroupInfo),value=TRUE))), 
 direction="long", 
 v.names = c("Member","MemberID"),
 sep="_")[,-3]
#>           LeaderName Group    Member MemberID
#> John.3.1        John     3      Lucy        1
#> Jane.1.1        Jane     1 Stephanie        2
#> Louis.4.1      Louis     4     Chris        3
#> Carl.2.1        Carl     2    Leslie        4
#> John.3.2        John     3      Earl        5
#> Jane.1.2        Jane     1    Carlos        6
#> Louis.4.2      Louis     4     Devon        7
#> Carl.2.2        Carl     2   Francis        8
#> John.3.3        John     3    Luther        9
#> Jane.1.3        Jane     1     Peter       10
#> Louis.4.3      Louis     4                   
#> Carl.2.3        Carl     2   Severus       11

Created on 2019-05-23 by the reprex package (v0.2.1)

M--
  • 25,431
  • 8
  • 61
  • 93
  • Thank you! What does the [0-9] represent? – LordVoldemort May 23 '19 at 19:09
  • 1
    @LordVoldemort You have `Member1`, `Member2`, etc. Basically, `Member`+`digits`. `[0-9]` represent digits. I am using regex to find the columns that match that pattern to avoid hardcoding name of every column. Look here to get a bit familiar with regex https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf – M-- May 23 '19 at 19:32
  • What about the last line: sep="_")[,-3] What is this for? – LordVoldemort May 23 '19 at 20:25
  • `reshape` adds a timevar columns (you can still see its values in the rownames: `LeaderName.Group.time`) which is not desired here. That `[,-3]` is just getting rid of that column. Please note that this is basically a one-liner. I just broke them into couple lines to make it more legible. About `sep="_"` you can read on https://rdocumentation.org/packages/stats/versions/3.6.0/topics/reshape here. Actually here it does not do anything because we do not have a `timevar` but if we had, like in this https://stackoverflow.com/a/27247171/6461462 we would get `transaction_"type"` instead of **`.`** – M-- May 23 '19 at 20:44
  • @LordVoldemort all that said, here we don't need `sep=_` because we are going from wide data to long, but I have it there as a it is a habit for me. Just a short description of my solution: I reshape the data from wide format to long, where `LeaderName` and `Group` are my keys and I have two variables `Member` and `MemberID` which I specify what column to corresponds to which one and then I remove the `timevar` created by `reshape()` – M-- May 23 '19 at 20:50
  • what if you have more than 9 values. Let's say you have a group that has 10 or 11 members. It doesn't seem to work then. – LordVoldemort Aug 06 '19 at 15:49
  • @LordVoldemort `[0-9]` is regex; read here: https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf – M-- Aug 06 '19 at 18:35