upload multiple csv files into one dataframe while defining variable types (using tidyverse)

Question

I have a list of csv files in a specific path. I am hoping to upload all of them into one dataframe. This is the code I use:

d <-
    list.files(path,pattern="*.csv", full.names = T) %>% 
    map_dfr(read_csv)

Trouble is that some of these columns (for example the column array_values) are strings that are then converted into numbers. I tried all sorts of ways to convert the variables but can't get it to work unless I have a much more complicated code in which I upload the files one by one, convert and then add to the larger dataframe. Would love to learn if there is a simple way to add it to the code here.

Thanks!

Base on your particular problem, I would suggest to loop reading the data sets and add all of them to a list. Once you found a pattern between the columns names in R, then you can merge all the data sets together. Or if you know for sure each data set must have X number of columns with Y names, then you can assign while reading and merging. — AugtPelle, Feb 08 '22 at 20:28
[See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. — camille, Feb 08 '22 at 22:13
Thanks Camille! I had a harder time to think about how to create the data for that but I get your point. — chagag, Feb 09 '22 at 01:54

score 1 · Answer 1 · answered Feb 08 '22 at 22:54

Camille's point is valid. Complete code at least includes the packages that you used with the code snippet you provided.

Having said that, if your CSVs all have the same columns (thus assuming all the columns are the same types) and the columns are in the same order and your problem is that a character column is read as a factor column in some cases or something similar, you can add an argument to read_csv(), col_types to make sure each column is read the same. It's difficult to tell from your question what exactly is wrong.

library(tidyverse)

> list.files(pattern="t.*csv")
[1] "test1.csv" "test2.csv"
> d <- list.files(pattern="t.*csv") %>% map_dfr(read_csv, col_types="dc")
                                                                                                                                                                                                        > d            
# A tibble: 6 × 2
   col1 col2 
  <dbl> <chr>
1   3   a    
2   4   b    
3   4.5 a    
4  13   a    
5   4   goat 
6   4.5 a

You can find the column types in the "Column Specification with readr" section here.

Thanks! I appreciate it. That's exactly what I needed. – chagag Feb 09 '22 at 01:55 — chagag, Feb 09 '22 at 01:55

score 0 · Answer 2 · answered Feb 09 '22 at 05:07

Assuming that you have mutliple datasets and having some trouble in some of the columns which are not unique among all the datasets. you can also acheive this using below:

library(dplyr)

path <- 'your path'
filename <- list.files(path, pattern = "*.csv", recursive = TRUE)
filename <- stringr::str_subset(filename,pattern="test")

#Creating a custom function to loop on all files
read_csv_files <- function(x){
    df <- read_csv(path = paste(path, x, sep = "/")) 
    df$yourcol <- as.character(df$yourcol) # you can change datatype for the respective columns as per your needs
    fName <- x
    df <- cbind(fName,df)
    return(df)
}

bind_data <- lapply(filename, read_csv_files) %>%
bind_rows()

Hope this will help. will be happy to connect incase required.

upload multiple csv files into one dataframe while defining variable types (using tidyverse)

2 Answers2