1

So I have a large number of CSV's that are dropped into a folder each week, and the file names and number of CSV's changes each week. The format is consistent, and the manipulation I need to do is consistent, but the inputs are dynamic. Each CSV is essentially a data table with a unique ID for a user, an email address, and a binary 1/0 for an action those users completed. Keep in mind, I'm a fairly novice R user, so mostly I've been poking around here for the answer

Essentially what I'm hoping to be able to do is upload all CSV's from this folder, and then dynamically reference the tables and perform a uniform set of actions on them each time. Whether there is 4 or 40 tables, I need to do the same action.

One of the actions needed, is to remove the "Id" column from each table. I use the email to join to a table later on, and the ID column is not useful, so it's easier to just drop the column. All these CSV's include this "Id" column, so essentially I just need to drop ID from all of the tables.

Here's a quick preview and the manual way I've been going about this.

#sample CSV with long file name '2018_October_10_regional_users_action_x'
   ID       email address       action x
1 365367   joe.schmoe@email.com    1
2 953164   fake.guy@email.com      0

#sample CSV with long file name '2018_October_10_regional_users_action_z'
   ID       email addresss      action z
1 798842   Jill.fake@email.com     0
2 100321   madeup.j@email.com      1

#code I've been using
setwd(choose.dir())
temp <- list.files(pattern = "\\.csv" # picking all the csv's from my folder

#I was inserting these all as separate objects, since that's the way I know 
# how to do what I need, but super manual
#list2env(
  lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))),
        read_csv), envir = .GlobalEnv)

# Manual way I was dropping my column
2018_October_10_regional_users_action_x <- select(2018_October_10_regional_users_action_x, -Id)
2018_October_10_regional_users_action_z <- select(2018_October_10_regional_users_action_z, -Id)

There are a few other things I'm doing, but their all fairly simple and similar in nature to this column drop. So if I can figure out how to this one, I can apply it through the rest of my code.

I tried using get and mget to put these into a list and then building a function or for loop, since manually grabbing the object names (Which are usually really long) and building these functions one at a time is not scalable.

Anyone know of a way I can 1) dynamically reference the table's I've input and 2) dynamically reference them in functions or formulas to accomplish some basic manipulation, similar to the column drop I reference above?

  • 2
    Don't build a bunch of variables (avoid `get()` and `assign()`). Just read all your data into a list that you can easily iterate over. See https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once. then oyu can just `lapply(data, function(x) select(x, -Id))` – MrFlick Nov 01 '18 at 21:43

0 Answers0