0

I have read many stackoverflow questions and answers, but I still cannot manage to get a solution for my problem: I want to read in 5 columns approx. 80 .csv files into R without having to type all code manually and then combine these files in one dataframe. Then, this dataframe needs to be combined with one other dataframe with the same amount of columns.

So I figured to do it with a for loop and that worked, but I can't manage to do further computations on it. I did this, and I saw the files being read in:

filenames <- list.files(path = getwd(), pattern = "*.csv")
for (i in filenames) {
filepath <- file.path(getwd(), paste (i, sep = ""))
assign(i, fread(filepath, select = c(1,2,3,25,29), sep = ","))

I don't know how to reach the files that have just been read in, i.e. typing in a variable name (e.g. df2). And how do I combine these into one dataframe to which I can assign the column names of the other dataframe I want to combine it with?

Marijn
  • 61
  • 1
  • 7

2 Answers2

0

Well, you can choose the CSV file.

filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)

Or, hard-code the path.

# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")

For joining and merging, here are some great rules of thumb.

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)

See the link below for more details.

How to join (merge) data frames (inner, outer, left, right)?

ASH
  • 20,759
  • 19
  • 87
  • 200
0

You can use map_df from purrr

filenames <- list.files(path = getwd(), pattern = "*.csv", full.names = TRUE)

reader = function (x) {
 fread(x, select = c(1,2,3,25,29), sep = ",")
}

reading_files <- map_df(filenames, reader)

map_df will read in all your files and bind them with the very efficient bind_rows