0

I currently have 19 excel files with 7 columns in 11 of them and 9 columns in the other 8. Both files essentially have the exact some data, but the last 9 just has some extra that I don't need. My ultimate goal is to pluck out the 4 columns that I need to manipulate and have them all as one data.frame so I can perform cross tabulations later.

I have successfully accomplished importing a singular file using the following code to deal with my formatting issues:

import <- read.csv(file = "myfile.csv", head = TRUE, sep = "|", stringAsFactors = FALSE)

I am trying to come up with a succinct way to import this data in one shot. I found this SO Question and tried using the codes provided as there are tons of suggestions, however I only got one option to actually work for me without producing an error or a warning.

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i], head = TRUE, 
      sep = "|", stringAsFactors = FALSE))

My problem now, the files I wish to import are displaying in the global environment. They show as data.frame types with accurate lengths, size and values for their respective contents. But I can't call any of them. In the comments of that question, someone asked about this but I did not exactly see any replies to remedy it. Although there are file names displayed (being the original excel document name) these names contain spaces and if I try to call them based on the file names R doesn't like it and won't do it.

As an alternative import method, I also attempted the following which I believe began to work:

# Get the files names
files = list.files(pattern="*.csv")
# First apply read.csv, then rbind
myfiles = do.call(rbind, lapply(files, function(x) read.csv(x, head = TRUE, 
                   sep = "|", stringsAsFactors = FALSE)))

but because the first 11 files have 7 columns and the rest have 9, I was given error message

Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

Maybe separating the files into different directories could aid in this but seeing as I am providing this for others to use, it would be more professional to have a code that does it for them.

So Ultimately: I either need to know how to access this data or if there is an alternative way to do this I would be so greatly appreciative!

rstudent
  • 21
  • 2
  • If you use `data.table::rbindlist` with `fill = TRUE` or `dplyr::bind_rows` instead of `do.call(rbind, ...)` your list approach will work just fine. E.g., `myfiles = data.table::rbindlist(lapply(files, function(x) read.csv(x, head = TRUE, sep = "|", stringsAsFactors = FALSE)), fill = TRUE)` – Gregor Thomas Jul 06 '17 at 21:58
  • Though if you're going to use `data.table`, might as well use `fread`: `rbindlist(lapply(files, fread, sep = "|")), fill = TRUE)`. It will auto-detect the header and defaults to `stringsAsFactors = FALSE` in addition to being much much faster. – Gregor Thomas Jul 06 '17 at 22:01
  • I just tried the `fread` example and I love the progress it shows as it reads through all the files. Once it hits the 9 column files however, I get an Error message: – rstudent Jul 06 '17 at 22:15
  • `Error in FUN(X[[i]], ...) : Expected sep ('|') but new line, E0F (or other non printing character) ends field 3 when detecting types from point 0: {gives example data here}` – rstudent Jul 06 '17 at 22:22
  • Sounds like some of your files have irregularities or aren't rectangular. I'd try setting `fill = TRUE` for `fread` too... see `?fread` for details. `rbindlist(lapply(files, fread, sep = "|", fill = TRUE)), fill = TRUE)` – Gregor Thomas Jul 06 '17 at 22:25
  • Do the 4 columns you want to keep have the same names across the different data files? If so you can select just those columns from each file before you combine them. – Marius Jul 06 '17 at 22:48
  • two of them do, one is the same but capitalization is different (which I believe can be handled) but the fourth columns are different. – rstudent Jul 06 '17 at 23:48

0 Answers0