0

I have my company's marketing expense report in Excel, with five tabs - each representing the line item expenses for each of the last five years and identical columns. I'm trying to automate/script the restructure of the file into one dataframe with the additional variable "year", so I can analyze it in Tableau without using a join.

Is it possible to read an xlsx file in R and rbind() the tabs somehow? I'm currently reading in the tabs as separate .csv files, which is just as labor-intensive as restructuring manually in Excel. Here's what I have (except with many more tabs):

data1<- read.csv("tab1.csv") #This year
data1$year <- c(1:nrow(data1)) 
data1$year <- 2016

data2<- read.csv("tab2.csv") #Last year
data2$year <- c(1:nrow(data2)) 
data2$year <- 2015

data3<- read.csv("tab3.csv") #Two years ago
data3$year <- c(1:nrow(data32)) 
data3$year <- 2014

data <- rbind(data1, data2, data3)
rm(data1,data2,data3)

1 Answers1

0

If you know how many sheets there are you could use readxl:

library(readxl)
listy <-lapply(1:5, function(x) read_excel("EXCELWORKBOOK.xlsx", sheet = x)) #here you know that there are 5 sheets in the excel workbook
datr <- do.call(rbind, listy)
erasmortg
  • 3,246
  • 1
  • 17
  • 34
  • Thank you. I will try this at home tonight, as I have to submit an IT ticket to download packages. – N. Anderson Aug 17 '16 at 17:08
  • You don't need to know how many sheets there are in advance, you can discover the names/number of sheets with `readxl::excel_sheets()` – arvi1000 Aug 17 '16 at 20:12
  • Is there any way to loop binary categorical variables into each individual sheets? i.e. "Last week = 1" for all the variables in sheet 1, and 0 for all the others? Sorry not familiar with readxl. – N. Anderson Aug 19 '16 at 16:53
  • Could you share some reproducible data? I'm not sure I get your question fully – erasmortg Aug 19 '16 at 17:35
  • The code you provided was very helpful. Each tab has identical columns, and each tab represents each of the last five years of my company's Marketing Expense report. I'd like all the data to be in one data frame, with five new columns: 2016, 2015, 2014, etc. – N. Anderson Aug 22 '16 at 14:53
  • If the data is from the 2016 tab, I'd like that values to be "1", else "0"... the same for 2015, etc. When I was doing an rbind() on the separate .csv files, I used the following code: for(i in 1:nrow(data)){ if(data$Days.Since.Closed[i]<365) {data$Active.2016[i] <- 1} } for(i in 1:nrow(turn)){ if (data$Days.Since.Closed[i] > 365 & data$Days.Since.Closed[i] < 730) {data$Active.2016[i] <- 1} } } ... I guess what I'm asking is, is there a way to mark which tab the data was part of as I read it into R? – N. Anderson Aug 22 '16 at 15:01
  • Maybe with `ifelse`? Not seeing the shape right now makes it difficult to visualize: `ifelse(data$Days.Since.Closed < 365,1,0)` – erasmortg Aug 23 '16 at 08:49