0

To preface, I have read some posts which sort rows of a dataframe by the values in another vector, but this isn't quite what I'm after. My data contains patient ID's and that patient's data on that row:

           ID Group    L_HCH    R_HCH    L_HCB    R_HCB    L_HCT    R_HCT L_HC_Total R_HC_Total
121    GP_M01   PAT 0.120000 0.110000 0.040000 0.040000 0.040000 0.040000   0.200000   0.190000
122    GP_M02   PAT 0.110000 0.120000 0.060000 0.060000 0.020000 0.010000   0.190000   0.190000
123    GP_M03   PAT 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000   0.000000   0.000000
124    GP_M05   PAT 0.117340 0.125620 0.050664 0.045523 0.020289 0.012440   0.188293   0.183583
125    GP_M08   PAT 0.114000 0.119000 0.049000 0.065000 0.027000 0.011000   0.190000   0.195000
126  GSTTC_01   PAT 0.151000 0.140000 0.049000 0.058000 0.042000 0.033000   0.242000   0.231000
127  GSTTC_11   PAT 0.130000 0.130000 0.080000 0.070000 0.030000 0.040000   0.240000   0.240000

etc. Some of the rows are in the wrong place. I have another list of only the ID's in a known good order:

> PT_IDs
[1] "CON_L01"   "CON_L03"   "CON_L04"   "CON_L05"   "CON_L07"   "CON_L10"   "CON_L14"   "CON_L16"   "CON_L17"   "CON_L18"  
 [11] "CON_L19"   "CON_L23"   "CON_L25"   "CON_L26"   "CON_L27"   "CON_L29"   "CON_L30"   "CON_L31"   "CON_L35"   "CON_L36" 

etc.

I could, I suppose, write the main dataframe out to a csv, rearrange the rows manually, and then read it back in, but I would like to know what the best way is to rearrange the rows of the dataframe by the ID column so that they are in the same order as the equivalent IDs in the PT_IDs list. All of the values that are in the PT_IDs list are also in the ID column, so there is no funny business there.

Thank you for any help!

Rowan
  • 351
  • 2
  • 13
  • 1
    Try `df %>% arrange(factor(ID, levels = pd_ids))` You can can look at Ronak's answer in above link – Karthik S Jun 16 '21 at 14:05
  • Thank you both for the responses - Ronak's solution there using `arrange()` does seem to be the cleanest and most readable. I might swap out my solution using `match()` for this in future. – Rowan Jun 16 '21 at 14:09

1 Answers1

0

After a more thorough scoot through the R documentation, I came across the match() function, which seemed to have the functionality I wanted ([match][1]) - for anybody else who runs into this, the solution boiled down to:

df_arranged <- patient_df[match(PT_IDs, patient_df$ID), ]

Match just returns the position of a given element, so that is used to index the correct rows. Hopefully I'm not just adding to a pile of duplicate questions! [1]: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/match

Rowan
  • 351
  • 2
  • 13