3

I had an earlier post regarding how to delete ID if any of the rows within ID contain certain strings (e.g., A or D) from the following data frame in a longitudinal format. These are R code examples that I received from the earlier post (r2evans, akrun, ThomasIsCoding) in order:

  1. d %>% group_by(id) %>% filter(!any(dx %in% c("A", "D"))) %>% ungroup()
  2. filter(d, !id %in% id[dx %in% c("A", "D")])
  3. subset(d, !ave(dx %in% c("A", "D"), id, FUN = any))

While these all worked well, I realized that I had to remove more than 600 strings (e.g., A, D, E2, F112, G203, etc), so I created a csv file for the list of these strings without a column name. 1. Is it the right approach to make a list? 2. How should I modify the above R codes if I intend to use the file of the strings list? Although I reviewed the other post or Google search results, I could not figure out what to do with my case. I would appreciate any suggestions!

Data frame:

id   time   dx
1     1     C
1     2     B
2     1     A
2     2     B
3     1     D
4     1     G203
4     2     E1

The results I want:

id    time  dx
 1     1     C
 1     2     B

UPDATE: Tarjae's below answer resolved the issue. The following are R codes for the solution.

my_list <- read.csv("my_list.csv")

columnname
    A
    D
    E2
    F112
    G203
  1. d %>% group_by(id) %>% filter(!any(dx%in%my_list$columnname)) %>% ungroup()
  2. filter(d, !id %in% id[dx %in% my_list$columnname])
  3. subset(d, !ave(dx %in% my_list$columnname, id, FUN = any))
birch
  • 47
  • 5
  • I don't see why it wouldn't work. Why aren't you trying it? `list("a","b") %in% c("a", "b")` returns TRUE TRUE – IRTFM Feb 14 '22 at 21:42
  • Hi IRTFM, thank you for your advice. Since I have more than 600 strings, I created a csv file. The file contains the strings in one column without a column name. I tried my_list <- read.csv("mylist.csv") or my_list <- c(my_list) and filter(df1, !id %in% id[dx %in% my_list]). But both did not work. If I want to use a list from a csv file, what should I do? – birch Feb 14 '22 at 22:43
  • Without seeing `str(my_list)` output, I'm not able to say anything specific. Post either that result (and NOT in the comments) or post `head(my_list)`. When people say "I did such and such" but do not show files and code,. there remains great uncertainty and potential for cofusion and inaccuracy. Hence the need to require an MCVE. – IRTFM Feb 15 '22 at 01:22
  • Thank you for your suggestion, IRTFM! I got the answer from Tarjae. – birch Feb 15 '22 at 14:00

1 Answers1

2

This is a good strategy:

Put your values in a vector or list here my_list then filter the dx column by negating by ! and using %in% operator:

library(dplyr)

my_list <- c("A", "D")

df %>% 
  filter(!dx %in% my_list)
  id time   dx
1  1    1    C
2  1    2    B
3  2    3    B
4  4    1 G203
5  4    1   E1

Expanding the list of values: my_list <- c("A", "D", "G203", "E1")

gives with the same code:

library(dplyr)

df %>% 
  filter(!dx %in% my_list)

  id time dx
1  1    1  C
2  1    2  B
3  2    3  B
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • Hi TarJae, thank you for your advice. Since I have more than 600 strings, I created a csv file. The file contains the strings in one column without a column name. I tried my_list <- read.csv("mylist.csv") or my_list <- c(my_list) and followed your R codes: df %>% filter(!dx %in% my_list). But both did not work. If I want to use a list from a csv file, what should I do? – birch Feb 14 '22 at 22:41
  • 1
    If you have a csv file then you should import it as dataframe with: `library(readr) my_list <- read_csv("yourpath/yourfile.csv")`. Then `df %>% filter(!dx %in% my_list$columname` – TarJae Feb 14 '22 at 22:46
  • 1
    Thank you, TarJae! I did not think it was necessary to name the column. But after adding a column name to the file and using R code my_list$columnname, it worked perfectly! I accepted & upvoted your answer. – birch Feb 14 '22 at 23:22