1

I have data in data.frame and I am gonna try pipeline feature of dplyr packages to do few pipeline operation in R. For example, given dataframe objects, first I will do subset, then export as csv files format. I am studying the feature of dplyr packages, so not perfectly understand this. Any help ? Here is the simple reproducible example for simulation:

a <- GRanges(
  seqnames=Rle(c("chr1", "chr2", "chr3", "chr4"), c(3, 2, 1, 2)),
  ranges=IRanges(seq(1, by=9, len=8), seq(7, by=9, len=8)),
  rangeName=letters[seq(1:8)], score=sample(1:20, 8, replace = FALSE))

I do subsetting first:

a %>% subset(pvalue < 1e-4 & pvalue > 1e-9)

then wants to do several pipeline operation by using feature of dplyr:

a %>% subset(pvalue < 1e-4 & pvalue > 1e-9) %>% write.table(x, "foo.csv") %>% as.data.frame(x)

but I have an error when I do second step. If I need to do several pipeline work like result of first is used in the second, how can I proceed this in R by using dplyr packages ? Thanks

2 Answers2

3

Using iris, to make your example reproducible you can:

iris %>% filter(Sepal.Length > 5.2) %>% write.table("foo.csv")

Some side remarks:

  • subset is more a base R approach. Why not using dplyr's verbs, eg filter, select, etc. ?
  • The pipe arguments (it's more a magrittr than a dplyr operator now), throws the left hand side as the first argument on the right hand side, so write.table(x, ...) cannot work as intended.
  • as dplyr works with data.frames, you do not need as.data.frame
Community
  • 1
  • 1
Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38
  • Thank you for this quick respond. Is there any existing thread to explain pipeline feature of dplyr more specifically? I am wondering I need to look at it and learn. –  Jun 15 '16 at 11:02
  • See link above/[`magrittr`'s vignette](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html). The perfect place to start ;-) – Vincent Bonhomme Jun 15 '16 at 11:12
  • Thanks, I will go through with it. –  Jun 15 '16 at 12:59
0

If you want to extract several different subsets and write them out, you may want to use group_by and do. First create a categorical variable that splits up your data into the subsets you want. Here's an example that works:

iris %>% mutate(
        slcat    = cut(Sepal.Length, c(0, 4, 5, 6, 8)),
        filename = paste0("file", slcat, ".csv")
      ) %>% 
      group_by(slcat) %>% 
      do(result = write.csv(., file = .$filename[1]))