1

Hi all I am trying to arrange duplicated codes so they show up after one another. Please see the code and data below:

df1 <- structure (list(
  subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-1945", "191-3000", "191-5000", "191-9600", "191-0001", "191-0002", "191-0003", "191-0004", "191-5000"), 
  edta_collect = c(1,0,1,1,1,0,0,1,1,1,1,1,1,1),
  edta_code = c("EDTA45", NA, "EDTA20", "EDTA66", "EDTA12", NA,NA,"EDTA19", "EDTA03", "EDTA66", "EDTA10", "EDTA03", "EDTA30", "EDTA20"), 
  ipv = c(1,1,4,6,3,2,5,1,3,4,5,2), 
  epds = c(13, 12, 10, 8, 30, 33, 20, 26, 12, 10, 11, 15, 1, 13, 40)), 
  class = "data.frame", row.names = c(NA, -14L))
edta <- df1 %>%
  select(subject_id, edta_collect, edta_code) %>%
  filter(edta_collect == 1)

n_occur_edta <- data.frame(table(edta$edta_code))

edta[edta$edta_code %in% n_occur_edta$Var1[n_occur_edta$Freq > 1], ]

Current output:

   subject_id edta_collect edta_code
2    191-3457            1    EDTA20
3    191-0987            1    EDTA66
6    191-9600            1    EDTA03
7    191-0001            1    EDTA66
9    191-0003            1    EDTA03
11   191-5000            1    EDTA20

Desired output:

   subject_id edta_collect edta_code
2    191-3457            1    EDTA20
11   191-5000            1    EDTA20
3    191-0987            1    EDTA66
7    191-0001            1    EDTA66
6    191-9600            1    EDTA03
9    191-0003            1    EDTA03

Would be ideal to not completely change my code an maybe just to it.

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Thandi
  • 225
  • 1
  • 2
  • 9
  • Sort the dataframe by edta_code, possible duplicate of https://stackoverflow.com/q/1296646/680068 – zx8754 Jul 18 '23 at 08:34

1 Answers1

2

In the more dplyr sense, I will generate your current output by the following way:

df2 <- edta %>%
  filter(n() > 1, .by = edta_code)

df2
#   subject_id edta_collect edta_code
# 1   191-3457            1    EDTA20
# 2   191-0987            1    EDTA66
# 3   191-9600            1    EDTA03
# 4   191-0001            1    EDTA66
# 5   191-0003            1    EDTA03
# 6   191-5000            1    EDTA20

If you simply want to sort the data by the alphabetical order of edta_code, you can solely use arrange():

df2 %>%
  arrange(edta_code)

#   subject_id edta_collect edta_code
# 1   191-9600            1    EDTA03
# 2   191-0003            1    EDTA03
# 3   191-3457            1    EDTA20
# 4   191-5000            1    EDTA20
# 5   191-0987            1    EDTA66
# 6   191-0001            1    EDTA66

If you have to arrange it by the order in which they first appear, you can convert edta_code into the factor type and redefine its levels:

# Option 1
df2 %>%
  arrange(factor(edta_code, levels = unique(edta_code)))

# Option 2
df2 %>%
  arrange(forcats::fct_inorder(edta_code))

#   subject_id edta_collect edta_code
# 1   191-3457            1    EDTA20
# 2   191-5000            1    EDTA20
# 3   191-0987            1    EDTA66
# 4   191-0001            1    EDTA66
# 5   191-9600            1    EDTA03
# 6   191-0003            1    EDTA03
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51