-2

I was wondering how I can handle 2500 .csv files with the same number of columns (i.e., 4 columns)? I want to import these files, remove the first and second columns, change the title of the remaining columns to x and y, remove duplicates in each data frame, and finally save these data frames as separate .csv files (2500 files). I have used the following script:

library(dplyr)
# Get all filenames
list.files(path = "D:/R_project", full.names = `TRUE) %>%`
  # Import all files
  purrr::map(readr::read_csv) %>%
  purrr::map(
    ~ .x %>%
      # Select columns and rename
      select(
        x = Col3,
        y = Col4
      ) %>% 
      # Remove duplicates
      distinct()
  ) %>% 
  # Save all files (same filename, but in a different folder)
  purrr::walk2(
    list.files("D:/R_project/Treated"),
    ~ readr::write_csv(x = .x, file = paste0("output/folder/", .y))
  )

However, I received this error for all of the data frames in the end (below is an example for one of the data frames):

Rows: 1579 Columns: 4
Column specification ---------------------------------------
Delimiter: ","

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
New names:                                                                                                          
* `` -> ...1

How can I rectify the problem? any helps?

camille
  • 16,432
  • 18
  • 38
  • 60
  • Read https://stackoverflow.com/a/24376207/3358227. Load them in using `alldat <- lapply(list_of_files, read.csv)` (including additional args as required). From there, change each frame as desired (dupes, renaming, etc). For saving back to files, re-read the first link. – r2evans Dec 28 '21 at 20:57
  • Welcome to Stack Overflow! Can you say a little bit more about what you've tried so far to find answers to your own questions? If you can figure out how to do the steps you want for a single file, then a `for` loop iterating over the names of the files should do the trick ... – Ben Bolker Dec 28 '21 at 20:59
  • Thank you @BenBolker! Actually, I know the steps for a single file but don't know how to use them for more than 2000 .csv files. For example, I know that there are several ways for removing certain columns from a single data frame. I searched the Stack Overflow for discovering more on the topic but there were no similar issues. – Hooman Norouzi Dec 28 '21 at 21:49
  • If you post example code for handling a single file, someone can help show you how to make it work for 2500 separate files. – Ben Bolker Dec 28 '21 at 22:30
  • Does this answer your question? [How do I make a list of data frames?](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) – Avraham Dec 28 '21 at 22:37
  • That's not an error, it's just a message to let you know how the files were read. It even says in the message how to turn it off. Is that the only issue happening here? Do the files write the way you expect them to? – camille Dec 29 '21 at 23:19
  • Unfortunately, the files were not written at all! – Hooman Norouzi Dec 30 '21 at 08:16

1 Answers1

0

There are many ways to do this in R. Under is an example using dplyr for data manipulation, readr for import/export of CSVs and purrr to handle all your files at the same time.

library(dplyr)

# Get all filenames
list.files("path/to/your/csv/files", full.names = TRUE) %>%
  # Import all files
  purrr::map(readr::read_csv) %>%
  purrr::map(
    ~ .x %>% 
      # Select columns and rename
      select(
        x = <your x column>,
        y = <your y column>
      ) %>% 
      # Remove duplicates
      distinct()
  ) %>% 
  # Save all files (same filename, but in a different folder)
  purrr::walk2(
    list.files("path/to/your/csv/files"),
    ~ readr::write_csv(x = .x, file = paste0("output/folder/", .y))
  )

Since you didn't provided us with any code you probably need to do some adjustments to this example, but I hope it's enough to get you started.

jpiversen
  • 3,062
  • 1
  • 8
  • 12