1

I have hundreds of spreadsheets with thousands rows in a folder and need to group them into a single worksheet. I already managed to do this, but I ended up copying the first line that corresponds to the header and would like to delete those lines, leaving only the first line (which is supposed to be a header).

My code to merge these files in a single df is:

setwd("~/Desktop/R studies/base1_rawsheets") #folder with spreadsheets
library(readxl)

data.files = list.files()

df <- readxl::read_excel(data.files[1], sheet=1) #reading the first file of list

for (file in data.files[-1]){
  newFile <- readxl::read_excel(file, sheet=1)
  df <- merge(df, newFile, all=T)
}

Thanks a lot for any help!

p.s.: The code I used was adapted from that solution here How to read multiple excel sheets in R programming?

Falves
  • 37
  • 1
  • 8
  • Why do you want to drop the header? Maybe if you show some data we can better understand. – Parfait Aug 10 '17 at 02:12
  • Each worksheet has a first row with the column names, when I group the hundreds of columns, those lines appear, and I do not need them, I need only the results that each column has. Understand? – Falves Aug 10 '17 at 03:42
  • Again please show data to better illustrate. I am very curious at this reappearing column names. – Parfait Aug 10 '17 at 11:08

1 Answers1

2

Simply drop the first observation of every captured xlsx after the first spreadsheet with [-1,].

df <- readxl::read_excel(data.files[1], sheet=1) #reading the first file of list

for (file in data.files[-1]){
  newFile <- readxl::read_excel(file, sheet=1)[-1,] ## Drops the first row
  df <- merge(df, newFile, all=T)
}
Nicolás Velasquez
  • 5,623
  • 11
  • 22
  • the code seems to work, it's working for about 20 minutes, I believe because the union of all the worksheets adds more than 600 thousand lines. Thank you in advance. Can you tell me how I can concatenate the F-G columns so that their contents appear in column F? – Falves Aug 10 '17 at 03:21
  • Glad it helped. Assuming that the columns are named F and G, then: df$F <- paste(df$F, df$G, sep = ""). If you need to insert some kind of separation character, insert it within the quotes of sep = "". – Nicolás Velasquez Aug 10 '17 at 20:05
  • Great! Great! Thank you so much, Nicolas! – Falves Aug 16 '17 at 20:03