-1

I have over 100 csv files, each containing one column of Date and Time (which are the exact same values for every file) and then a second column that is different for every file. I want the file name to be the column name.

So essentially I want to add a column to my data frame for each file.

What's the most efficient way to do this?

So far all I've done is make a list of all of the csv files in my folder because all of the information I have found so far only seems to be telling me how to add the data as more rows, not more columns.

BBODS
  • 23
  • 1
  • 2
    [Read multiple files onto a list](http://stackoverflow.com/questions/11433432) then use `do.call(cbind, mylist)` – zx8754 Nov 22 '16 at 23:00
  • 2
    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Nov 22 '16 at 23:00

1 Answers1

2

Here's an option. It's more or less handcrafted as I'm not aware of any single command to do this exactly as you've specified.

# the working directory contains d1.csv, d2.csv, d3.csv with
# matching first two columns. The third column is random data
read.csv("./d1.csv")
#>     a b         c
#> 1   1 A 0.5526777
#> 2   2 B 0.2161643
#> 3   3 C 0.3311132
#> 4   4 D 0.3577971
#> 5   5 E 0.2298579
#> 6   6 F 0.4014883
#> 7   7 G 0.2789038
#> 8   8 H 0.5729675
#> 9   9 I 0.3413949
#> 10 10 J 0.5807167

## identify the .csv files in the working directory
file_list <- dir(".", pattern = "*.csv")
file_list
#> [1] "d1.csv" "d2.csv" "d3.csv"

## for each of the .csv files, extract the base filename and 
## create a new object with that name containing the data.
## Additionally, name the third column by the basename
for ( file in file_list) {
  f <- sub("(.*)\\.csv", "\\1", file)
  assign(f, read.csv(file = file))
  assign(f, setNames(get(f), c(names(get(f))[1:2], file)))
}

## at this point, the objects d1, d2, and d3 have been created, 
## containing their respective data. The third column of each of 
## these is their originating filename.
d1
#>     a b    d1.csv
#> 1   1 A 0.5526777
#> 2   2 B 0.2161643
#> 3   3 C 0.3311132
#> 4   4 D 0.3577971
#> 5   5 E 0.2298579
#> 6   6 F 0.4014883
#> 7   7 G 0.2789038
#> 8   8 H 0.5729675
#> 9   9 I 0.3413949
#> 10 10 J 0.5807167

## specify the names of the date and time columns (common between files)
date_col <- "a"
time_col <- "b"

## use Reduce to go through the list of created objects and 
## merge them together
list_of_objects <- mget(sub("(.*)\\.csv", "\\1", file_list))
combined_files <- Reduce(function(x, y) merge(x, y, by = c(date_col, time_col)), list_of_objects)

combined_files
#>     a b    d1.csv    d2.csv    d3.csv
#> 1  10 J 0.5807167 0.8181820 0.7073864
#> 2   1 A 0.5526777 0.3225574 0.3758595
#> 3   2 B 0.2161643 0.6933108 0.5654979
#> 4   3 C 0.3311132 0.9309869 0.1727413
#> 5   4 D 0.3577971 0.8810876 0.7802144
#> 6   5 E 0.2298579 0.1023579 0.9925649
#> 7   6 F 0.4014883 0.1328283 0.7610007
#> 8   7 G 0.2789038 0.2926512 0.7469455
#> 9   8 H 0.5729675 0.8727978 0.3073394
#> 10  9 I 0.3413949 0.3107775 0.4778286

If there's a particular aspect that you don't understand, let me know and I'll expand the comments.

Jonathan Carroll
  • 3,897
  • 14
  • 34
  • No that mostly makes sense and works great! Is there a particular point in the above where I could possibly filter the rows of data for a particular date? Or would it be easier/faster to select that data afterwards. – BBODS Nov 23 '16 at 18:33
  • 1
    Easiest is to whittle it down afterwards, because you need to alter 100+ objects dynamically to do it first. If this suits your needs, the upvote and accept buttons are at your disposal. – Jonathan Carroll Nov 23 '16 at 19:55