0

i am searching for a function for doing a join based on groups for re-ordering my data.

I have a data set containing three columns: 1. Measurements 2. Groups (Sample ID's) 3. Measurement ID (protein sequence) and I would like to get a data frame containing the measurements of each group as columns and a column with all the measurement ID's. Furthermore, not all measurement ID's are present in all groups.

The data frame transformation should look as shown below:

cbind(
   c(1,2,3,4,5,6),
   c("Group 1","Group 1","Group 1","Group 2","Group 2","Group 2"), 
   c("Seq 1","Seq 2","Seq 3","Seq 2","Seq 4","Seq 5"))
-> 

cbind(
   c(1,2,3,0,0),   #Group 1
   c(0,4,0,5,6),   #Group 2
   c("Seq 1","Seq 2","Seq 3","Seq 4","Seq 5"))

I was thinking about selecting all unique measurement ID's into a separate data frame and doing a dplyr::left_join() for each group separately. The problem is that I have 21 groups and I think there must be a better solution.

Thanks in advance! Aaron

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Aaron
  • 1
  • 1
  • Use `data.frame()`, not `cbind()`. `cbind()` will create a `matrix` and lose all class information, converting your integer/numerics to strings if any strings are present. – Gregor Thomas Jan 08 '22 at 15:18
  • This isn't a join operation - a join is a way to stick to data frames together. This is a pivot or reshaping operation. See the linked FAQ for many more details. If we call your input data `input` and use column names, `tidyr::pivot_wider(d, names_from = group_col, values_from = first_col, values_fill = 0)` will work (filling in the appropriate column names instead of `group_col` and `first_col`, of course). – Gregor Thomas Jan 08 '22 at 15:23

0 Answers0