0

I have 1 folder consists of more than 100 csv files, each has different column names, and different file names. The example would be as below:

[1] "Data/Yahoo_2014.csv"   "Data/Yahoo_2015.csv"  
[3] "Data/Yahoo_2016.csv"   "Data/Yahoo_2017.csv"  
[5] "Data/Yahoo_2018.csv"   "Data/Yahoo_2019.csv"  
[7] "Data/Yahoo_2020.csv"   "Data/Google_2014.csv"
[9] "Data/Google_2015.csv"  "Data/Google_2016.csv"

etc

Each csv has different column names. Example for Yahoo Data

Date Yahoo

for Google

Date Google

The only thing that is similar is the first column (Date). I want to merge all of this data into one csv file in R so that I can proceed to analyze it. The result should be as below:

Date Yahoo Google
1   2014-01-05  75  50
2   2014-01-12  84  6
3   2014-01-19  81  3
4   2014-01-26  82  35

I already looked at other questions in StackOverflow but found nothing similar. I came up with this solution but it won't work because they have different column names.

data <- read.csv(paste0("Data/","Yahoo_2014.csv"),
                       skip=2, 
                       na.strings="<1")

allFileNames <- list.files("Data")
All <- data.frame(matrix(, nrow=0, ncol=3))
names(All) <- c("Date","Yahoo","Google")
for (filename in allFileNames) {
  fullFilename <- paste0("Data/",filename)
  Data <- read.csv(fullFilename,
                         skip=2, 
                         na.strings="<1")
  names(trendsData) <- c("Date","Yahoo","Google")
  All <- rbind(All,Data)
}
rob mayoff
  • 375,296
  • 67
  • 796
  • 848
aua
  • 97
  • 1
  • 7
  • (1) Based on [R Inferno](https://www.burns-stat.com/pages/Tutor/R_inferno.pdf) chapter 2 (growing objects), don't iteratively call `rbind(All,Data)`, it scales horribly. (2) In the dupe question, look in the answer for *"Combining a list of data frames into a single data frame"*, it is relevant to you. – r2evans Mar 12 '20 at 16:55
  • already make a list and do as your 2nd suggestions said : `my_files <- list.files(path = 'Data', pattern = "csv$", full.names = FALSE) big_data = dplyr::bind_cols(my_files)` but still there's error message `Error: Argument 1 must have names` – aua Mar 12 '20 at 17:32
  • You missed the step where you actually *read in the data*, perhaps with `alldat <- lapply(my_files, read.csv, skip=2, na.strings="<1")`. – r2evans Mar 12 '20 at 17:37
  • `allFileNames <- list.files(path = 'Data', pattern = "csv$", full.names = FALSE) for (filename in allFileNames) { fullFilename <- paste0("Data/",filename) allFileNames <- read.csv(fullFilename, skip=2, na.strings="<1") } big_data = dplyr::bind_cols(allFileNames) ` use this but the result is not what i expected. I need to have 3 columns, including Date, Yahoo, Google but with this code I only got 2, Date and Data (Yahoo and Google merged into one column) – aua Mar 12 '20 at 17:42
  • I suggest you `split` your files into `Google` and `Yahoo`, read them into separate frames, then figure out if and how you can `merge` them together based on `Date`. – r2evans Mar 12 '20 at 17:47
  • yes that's one way to do it. But I have 15 different file names so I guess it's not effective to do so – aua Mar 12 '20 at 17:56
  • `split(allFileNames, grepl("Google", allFileNames))` – r2evans Mar 12 '20 at 18:18

0 Answers0