0

I'm locked long ago trying to filter information from singles .csvs files and then merging them into a single one.

Each CSV count on the following columns, varying on the number of rows:

SNP.Name iHS.CHR iHS.POSITION iHS.iHS iHS..log10.p.value. frequency.class..mrk frequency.class.mean.log.iHHA.iHHD.. frequency.class.sd.log.iHHA.iHHD..

what I want is to filter each CSV by iHS.iHS iHS..log10.p.value. by values bigger or equal than 2 =>2 keeping all the other info from its row. Then merge that filtered info from each CSV into an unique CSV file.

I've been doing this in excel but it takes a long time and I really want to optimize my work. How can I solve this?

Pecun
  • 13
  • 2
  • 1
    What have you tried? do you already have your csvs in `R`? You can use `filter()` from `dplyr` to filter your data and then take a look at this response https://stackoverflow.com/questions/16138693/rbind-multiple-data-sets – boski Feb 27 '19 at 15:29

1 Answers1

1
library(tidyverse)

csv_files <- list.files(path = "my_folder_with_csv_files", pattern = "\\.csv$", full.names = TRUE)

# make a list of data frames of csv files in your directory
list_csv_files <- purrr::map(csv_files, ~readr::read_csv(file = .))

# assuming no errors with import...
# filter each data frame in list by condition specified in question
# then bind/merge data frames in list together into on single data frame
single_csv <- list_csv_files %>% 
  purrr::map(., ~dplyr::filter(., iHS.iHS iHS..log10.p.value. >= 2)) %>% 
  dplyr::bind_rows(.) #similar to do.call("rbind", .)

# export your single csv 
readr::write_csv(single_csv, path = "my_path_to_write_to")
EJJ
  • 1,474
  • 10
  • 17
  • Just to help you tighten it up instead of posting a separate answer, you can just use `readr::read_csv` as the second argument in your first `map`, since the file paths will get passed as `read_csv`'s first argument. You can also skip the `bind_rows` by using `map_dfr` in the line above – camille Feb 27 '19 at 17:38