The problem that I have with make
always rebuilding Makefile
targets (make always rebuilds Makefile targets) and its investigation uncovered another issue, which is the subject of this question. Repeated execution of the following R
code results in a loss of objects' attributes during data transformation operations.
For the record, I have to say that I've already written on this subject (Approaches to preserving object's attributes during extract/replace operations), but that question and answer were more general (and I was incorrect that simple saving attributes works - it worked for me as of that writing, because at the time I haven't been performing operations, potentially dangerous for objects' attributes).
The following are excerpts from my R code, where I'm experiencing loss of attributes.
##### GENERIC TRANSFORMATION FUNCTION #####
transformResult <- function (dataSource, indicator, handler) {
fileDigest <- base64(indicator)
rdataFile <- paste0(CACHE_DIR, "/", dataSource, "/",
fileDigest, RDS_EXT)
if (file.exists(rdataFile)) {
data <- readRDS(rdataFile)
# Preserve user-defined attributes for data frame's columns
# via defining new class 'avector' (see code below)). Also,
# preserve attributes (comments) for the data frame itself.
data2 <- data.frame(lapply(data, function(x)
{ structure(x, class = c("avector", class(x))) } ))
#mostattributes(data2) <- attributes(data)
attributes(data2) <- attributes(data)
result <- do.call(handler, list(indicator, data2))
saveRDS(result, rdataFile)
rm(result)
}
else {
error("RDS file for \'", indicator, "\' not found! Run 'make' first.")
}
}
## Preserve object's special attributes:
## use a class with a "as.data.frame" and "[" method
as.data.frame.avector <- as.data.frame.vector
`[.avector` <- function (x, ...) {
#attr <- attributes(x)
r <- NextMethod("[")
mostattributes(r) <- attributes(x)
#attributes(r) <- attr
return (r)
}
##### HANDLER FUNCTION DEFINITIONS #####
projectAge <- function (indicator, data) {
# do not process, if target column already exists
if ("Project Age" %in% names(data)) {
message("Project Age: ", appendLF = FALSE)
message("Not processing - Transformation already performed!\n")
return (invisible())
}
transformColumn <- as.numeric(unlist(data["Registration Time"]))
regTime <- as.POSIXct(transformColumn, origin="1970-01-01")
prjAge <- difftime(Sys.Date(), as.Date(regTime), units = "weeks")
data[["Project Age"]] <- as.numeric(round(prjAge)) / 4 # in months
# now we can delete the source column
if ("Registration Time" %in% names(data))
data <- data[setdiff(names(data), "Registration Time")]
if (DEBUG2) {print(summary(data)); print("")}
return (data)
}
projectLicense <- function (indicator, data) {
# do not process, if target column (type) already exists
if (is.factor(data[["Project License"]])) {
message("Project License: ", appendLF = FALSE)
message("Not processing - Transformation already performed!\n")
return (invisible())
}
data[["Project License"]] <-
factor(data[["Project License"]],
levels = c('gpl', 'lgpl', 'bsd', 'other',
'artistic', 'public', '(Other)'),
labels = c('GPL', 'LGPL', 'BSD', 'Other',
'Artistic', 'Public', 'Unknown'))
if (DEBUG2) {print(summary(data)); print("")}
return (data)
}
devTeamSize <- function (indicator, data) {
var <- data[["Development Team Size"]]
# convert data type from 'character' to 'numeric'
if (!is.numeric(var)) {
data[["Development Team Size"]] <- as.numeric(var)
}
if (DEBUG2) {print(summary(data)); print("")}
return (data)
}
##### MAIN #####
# construct list of indicators & corresponding transform. functions
indicators <- c("prjAge", "prjLicense", "devTeamSize")
transforms <- list(projectAge, projectLicense, devTeamSize)
# sequentially call all previously defined transformation functions
lapply(seq_along(indicators),
function(i) {
transformResult("SourceForge",
indicators[[i]], transforms[[i]])
})
After the second run of this code, names "Project Age" and "Project License" as well as other user-defined attributes of the data frame data2
are lost.
My question here is multifaceted:
1) what statements in my code could lead to loss of attributes AND WHY;
2) what is the correct line of code (mostattributes <- attributes
or attributes <- attributes/attr
) in transformResult()
and avector
class definition AND WHY;
3) is the statement as.data.frame.avector <- as.data.frame.vector
really needed, if I add class attribute avector
to a data frame object and, in general, prefer a generic solution (applicable not only to data frames); WHY OR WHY NOT.
4) saving via attr
in class definition doesn't work, it fails with the following error:
Error in attributes(r) <- attr :
'names' attribute [5] must be the same length as the vector [3]
Calls: lapply ... summary.data.frame -> lapply -> FUN -> summary.default -> [ -> [.avector
So, I had to go back to using mostattributes()
. Is it OK?
==========
I have read the following on the subject:
SO question: How to delete a row from a data.frame without losing the attributes (I like the solution by Ben Barns, but it differs a bit from the one suggested by Gabor Grothendieck and Marc Schwartz - see below);
SO question: indexing operation removes attributes (while the solution is legible, I prefer one, based on class definition /sub-classing?/);
A generic solution suggested by Heinz Tuechler (https://stat.ethz.ch/pipermail/r-help/2006-July/109148.html) - Do I need this?;
An explanation by Brian Ripley (http://r.789695.n4.nabble.com/Losing-attributes-in-data-frame-PR-10873-tp919265p919266.html) - I found it somewhat confusing;
A solution suggested by Gabor Grothendieck (https://stat.ethz.ch/pipermail/r-help/2006-May/106308.html);
An explanation of Gabor Grothendieck's solution by Marc Schwartz (https://stat.ethz.ch/pipermail/r-help/2006-May/106351.html) - very nice explanation;
Sections 8.1.28 and 8.1.29 of the "R Inferno" book (www.burns-stat.com/pages/Tutor/R_inferno.pdf) - I've tried his suggestions of using
storage.mode()
, but doesn't really solve the problem, as coercing viastorage
doesn't affectclass
of an object (not to mention that it doesn't cover other than coercion attribute-clearing operations, such as subsetting and indexing;http://stat.ethz.ch/R-manual/R-devel/library/base/html/attributes.html;
http://cran.r-project.org/doc/manuals/r-devel/R-lang.html#Copying-of-attributes.
P.S. I believe that this question is of general nature, so I haven't provided a reproducible example at this time. I hope that it's possible to answer this without such example, but, if not, please let me know.