Matching a parts of a row name from one dataframe (A) to another dataframe (B), then add columns from A to B that relate to that row

Question

I am a bit confused on how to perform this form of data wrangling, as I am new to R coding. My goal is to match subjectID information to this large data set that I have that have more rows than that of the subjectID data. This is because the large data has more than one session with a cohort of subjects. For example,Subject A would have data that has a row name SubjectA-01, SubjectA-02, etc.

My goal is to match SubjectID name to the large data set, such that I can add new columns (sex, age, BMI, etc.) as columns correlating to the data.

We can call this dataframe SubjectID <-

Subject ID	Sex	Age
SubjectA	M	32
SubjectB	F	16

And I want to use this information to match the beginning keyword in this matrix. Lets call this data set as BioResults.

SampleID	Blood Result
SubjectA-01	2.34
SubjectA-02	2.55
SubjectB-12	3.56

My goal is to make a new data set that looks like this:

SampleID	Blood Result	Sex	Age
SubjectA-01	2.34	M	32
SubjectA-02	2.55	M	32
SubjectB-12	3.56	F	16

What would be the best way to achieve this? I would appreciate any help as I am still new to this coding language. Thank you!

Try `BioResults %>% tidyr::separate(SampleID, c("SampleID, "OtherId") %>% right_join(SampleId)` — MrFlick, Jul 07 '21 at 07:38
What ever that "-01" chunk is. If you want to merge data, you need values that match exactly. It's easiest to remove the suffix to make the join work. — MrFlick, Jul 07 '21 at 07:45
What would be the best way to remove the suffix if this is for large data, rows exceeding 900 for subjects? Sorry for the extra followups — SpiderK, Jul 07 '21 at 07:49

score 0 · Answer 1 · answered Jul 07 '21 at 07:46

0

Does this work:

library(dplyr)
library(stringr)

BioResults %>% mutate(ID = str_remove(SampleID, '-..')) %>% 
       inner_join(subjectID, by = c('ID' = 'SubjectID')) %>% select(-ID)
     SampleID Blood.Result Sex Age
1 SubjectA-01         2.34   M  32
2 SubjectA-02         2.55   M  32
3 SubjectB-12         3.56   F  16

Data used:

BioResults
     SampleID Blood.Result
1 SubjectA-01         2.34
2 SubjectA-02         2.55
3 SubjectB-12         3.56
subjectID
  SubjectID Sex Age
1  SubjectA   M  32
2  SubjectB   F  16

answered Jul 07 '21 at 07:46

Karthik S

11,348
2
11
25

Say if some subjects had longer numerical values after, such as subjectX-0910 or even some with letters such as subject-UY9, how would I adjust the code to fit these in as well? – SpiderK Jul 07 '21 at 08:03
@Bbkazu, in that case use `str_remove(SampleID, '-.*')` – Karthik S Jul 07 '21 at 08:10
I did that, and it looks like it kept the data set, but the subjectID information such as sex, age, etc. doesn't add as columns into the BioResults dataframe? Am I doing something wrong? – SpiderK Jul 07 '21 at 08:26
@Bbkazu, that shouldn't happen, you may be missing something, I can't recreate the issue you are facing at my end to fix it. – Karthik S Jul 07 '21 at 08:33

Matching a parts of a row name from one dataframe (A) to another dataframe (B), then add columns from A to B that relate to that row

1 Answers1