How to lapply grep() data by id

Question

I have a df RawDat with two rows ID, data. I want to grep() my data by the id using e.g. lapply() to generate a new df where the data is sorted into columns by their id: My df looks like this, except I have >80000 rows, and 75 ids:

ID data abl 564 dlh 78 vho 354 mez 15 abl 662 dlh 69 vho 333 mez 9 . . .

I can manually extract the data using the grep() function:

ExtRawDat = as.data.frame(RawDat[grep("abl",RawDat$ID),])

However, I would not want to do that 75 times and cbind() them. Rather, I would like to use the lapply() function to automate it. I have tried several variations of the following code, but I don't get a script that provide the desired output.

I have a vector with the 75 ids ProLisV, to loop my argument

ExtRawDat = as.data.frame(lapply(ProLisV[1:75],function(x){     
Temp1 = RawDat[grep(x,RawDat$ID),]      # The issue is here, the pattern is not properly defined with the X input (is it detrimental that some of the names in the list having spaces etc.?)
Values = as.data.frame(Temp1$data)
list(Values$data)
}))

The desired output looks like this:

abl dlh vho mez ... 564 78 354 15 662 69 333 9 . . .

How do I adjust that function to provide the desired output? Thank you.

score 2 · Accepted Answer · answered Apr 25 '18 at 19:49

It looks like what you are trying to do is to convert your data from long form to wide form. One way to do this easily is to use the spread function from the tidyr package. To use it, we need a column to remove duplicate identifiers, so we'll first add a grouping variable:

n.ids <- 4 # With your full data this should be 75
df$group <- rep(1:n.ids, each = n.ids, length.out = nrow(df))
tidyr::spread(df, ID, data)

#   group abl dlh mez vho
# 1     1 564  78  15 354
# 2     2 662  69   9 333

If you don't want the group column at the end, just do df$group <- NULL.

Data

df <- read.table(text = "
  ID     data
  abl     564
  dlh     78
  vho     354
  mez     15
  abl     662
  dlh     69
  vho     333
  mez     9", header = T)

Thank you C. Braun! That is a very succinct solution to exactly what I was trying to do. One thing though: After 75 groups, the groups begin from 1 again, and the tidyr::spread() throws an error due to `Duplicate identifiers for rows` How do I tell the rep() function to keep numbering up until the end of rows? Thank you. — Rnewbie, Apr 25 '18 at 20:50
In the rep() function the first argument should read `1:nrow(RawDat)` — Rnewbie, Apr 25 '18 at 20:56

How to lapply grep() data by id

1 Answers1