I have an Excel file with a sheet for each week in my data set. Each sheet has the same number of rows, and each row is identical across the sheets (with the exception of the time period… sheet 1 represents week 1, sheet 2 week 2, etc.). I'm trying to import all the Excel sheets as one data frame in R.
For example, my data is essentially structured like this (with several more columns and sheets):
Week 1 sheet
ID Gender DOB Absences Lates Absences_excused
1 M 1997 5 14 5
2 F 1998 4 3 2
Week 2 sheet
ID Gender DOB Absences Lates Absences_excused
1 M 1997 2 10 3
2 F 1998 8 2 9
I'm trying to build a script that will take x numbers of sheets and combine them into one data frame like this:
Combined (ideal)
ID Gender DOB Absences.1 Lates.1 Absences.2 Lates.2
1 M 1997 5 14 2 10
2 F 1998 4 3 8 2
I'm using gdata to import the Excel files.
I've tried creating a loop (normally bad for R, I know...) that will go through all the sheets in the Excel file and add each to a list as a data frame:
library(gdata)
number_sheets <- 3
all.sheets <- vector(mode="list", length=number_sheets)
for (i in 1:number_sheets) {
all.sheets[[i]] <- read.xls("/path/to/file.xlsx", sheet=i)
}
This gives me a nice list, all.sheets
, that I can access, but I'm unsure about the best way to create a new data frame from specific columns in the list of data frames.
I've tried the code below, which creates a brand new data frame by looping through the list of data frames. On the first data frame, it saves the columns that are consistent in all the sheets, and then adds the week-specific columns.
Cleaned <- data.frame()
number_sheets <- 3
for (i in 1:number_sheets) {
if (i == 1) {
Cleaned <- all.sheets[[i]][,c("ID", "Gender", "DOB")]
}
Cleaned$Absences.i <- all.sheets[[i]][,c("Absences")] # wrong... obviously doesn't work... but essentially what I want
# Other week-specific columns go here... somehow...
}
This code doesn't work though, since Cleaned$Absences.i
is obviously not how you create dynamic columns in a data frame.
What's the best way to combine a set of data frames and create new columns for each of the variables I'm trying to track?
Extra hurdle: I'm also trying to combine two columns, "Absences" and "Absences_excused" into a single "Absences" column in the final data frame, so I'm trying to make my solution let me perform transformations to the new columns, like so (again, this isn't right):
Cleaned$Absences.i <- all.sheets[[i]][,c("Absences")] + all.sheets[[i]][,c("Absences_excused")]