Use of ldply
(package "plyr") to import multiple csv files from a folder: header faith, and how to do it for multiple folders?
set up:
- Desktop: MacBook Pro (Early 2011) with iOS 10.13.6
- Software version: R version 3.5.1 (2018-07-02) -- "Feather Spray"
- R studio: Version 1.1.456
I would like to import multiple csv files from specific folders and merge them into one file with 5 columns: Variable1/Variable2/file_name/experiment_nb/pulse_nb I have managed to make the importation of all files from the same folder from previous similar questions in StackOverflow in the same data.frame, however, I am not sure of how to do it for different folder and the faith of header of each file after merge, . As the file are too big to handle manually (200 000 lines per files), I want to make sure there is not any mistake that would cause all subsequent analysis to fail, such as the line of the header before the data of each csv file imported
The csv looks like this: "20190409-0001_002.csv" with the date, followed by the name of the experiment (0001) in the example, and the number of the pulse (002)
#setting package and directory
library(plyr)
library(stringr)
setwd("/Users/macbook/Desktop/Project_Folder/File_folder1")
#Creating a list of all the filenames:
filenames <- list.files(path = "/Users/macbook/Desktop/Project_Folder/File_folder1")
#creating a function to read csv and in the same time adding an additional column with the name of the file
read_csv_filename <- function(filename)
{
ret <- read.csv(filename, header=TRUE, sep=",")
ret$Source <- filename #EDIT
ret
}
#importing
import <- ldply(filenames, read_csv_filename)
#making a copy of import
data<-import
#modifying the file name so it removes ".csv" and change the header
data$Source<-str_sub(data$Source, end=-5)
data[1,3]<-"date_expnb_pulsenb"
t<-substr(data[1,3],1,3)
head(data, n=10)
#create a column with the experiment number, extracted from the file name
data$expnb<-substr(data$Source, 10, 13)
data$expnb<-as.numeric(data$expnb)
head(data, n=10)
tail(data, n=10)
1° Now I need to manage to import all the other folders in the same files, which I could eventually do manually because the number of folder is manually doable (9-10), but I am considering making a code for this as well for future experiments with big number of experiments. How to do that ? to first list all folder, then list all files from those folder, and then regroup them in one list files ? Is this doable with list.files ? The folder name will looks like this: "20190409-0001"
2° The result from the code above (head(data, n=10)) looks like this:
> head(data, n=10)
Time Channel.A Source pulsenb expnb
1 (us) (A) expnb_pulsenb NA NA
2 -20.00200030 -0.29219970 20190409-0001_002 2 1
3 -20.00100030 -0.29219970 20190409-0001_002 2 1
and
> tail(data, n=10)
Time Channel.A Source pulsenb expnb
20800511 179.99199405 -0.81815930 20190409-0001_105 105 1
20800512 179.99299405 -0.81815930 20190409-0001_105 105 1
I would like to run extensive data analysis on the now big list, and I am wondering how to check that in the middle of them I do not have some line with file headers. As the headers as the same in the csv file, does the ldply function already takes into account the headers? Would all the file header be in a separate line in the "import" data frame ? How to check that? (unfortunately, there is around 200 XXX lines in each file so I can not really manually check for headers).
I hope I have added all the required details and put the questions in the right format as it is my first time posting here :)
Thank you guys in advance for your help!