3

Here's the situation. My R code is supposed to check whether existing RData files in application's cache are up-to-date. I do that by saving the files with names consisting of base64-encoded names of a specific data element. However, data corresponding to each of these elements are being retrieved by submitting a particular SQL query per element, all specified in data collection's configuration file. So, in a situation when data for an element is retrieved, but afterwards I had to change that particular SQL query, data is not being updated.

In order to handle this situation, I decided to use R objects' attributes. I plan to save each data object's corresponding SQL query (request) - base64-encoded - as the object's attribute:

# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)

Then, when I need to verify whether the SQL has been query changed, I'd like to simply retrieve the object's corresponding attribute and compare it with the hash of the current SQL query. If they match - the query hasn't been changed and I skip processing this data request, if they don't match - the query has been changed and I go ahead with processing the request:

# check if the archive file has already been processed
if (DEBUG) {message("Processing request \"", request, "\" ...")}
if (file.exists(rdataFile)) {
  # now check if request's SQL query hasn't been modified
  data <- load(rdataFile)
  if (identical(base64(request), attr(data, "SQL"))) {
    skipped <<- skipped + 1
    if (DEBUG) {message("Processing skipped: .Rdata file found.\n")}
    return (invisible())
  }
  rm(data)
}

My question is whether it's possible to read/access object's attributes without fully loading the object from file. In other words, can I avoid the load() and rm() in the code above?

Your advice is much appreciated!

UPDATE: Additional question: What's wrong with my code, as it performs processing even when it shouldn't - in case, when all information is up-to-date (no changes in cache and in configuration file as well)?

UPDATE 2 (additional code per @MrFlick's answer):

# construct name from data source prefix and data ID (see config. file),
# so that corresponding data object (usually, data frame) will be saved
# later under that name via save()
dataName <- paste(dsPrefix, "data", indicator, sep = ".")

assign(dataName, srdaGetData())
data <- as.name(dataName)

# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)

# save current data frame to RData file
save(list = dataName, file = rdataFile)
# alternatively, use do.call() as in "getFLOSSmoleDataXML.R"

# clean up
rm(data)
Aleksandr Blekh
  • 2,462
  • 4
  • 32
  • 64

2 Answers2

2

You can't "really" do it, but you could modify the code in my cgwtools::lsdata function.

function (fnam = ".Rdata") 
{
    x <- load(fnam, envir = environment())
    return(x)
}

This loads, thus taking time and briefly taking memory, and then the local environment disappears. So, add an argument for the items you want to check attributes for, add a line inside the function which does attributes(your_items) ->y ; return (list(x=x,y=y))

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • Thank you very much for your answer! But, I really don't see essential difference with my approach (the one using `load`). If I understood correctly, your's approach's "trick" is auto-disappearing environment, but isn't it easier just to call `rm` and be done with it. Loading an object occurs in any case. Strangely enough, my code with `load` and `rm` doesn't work as expected. Is there anything that seems incorrect to you? – Aleksandr Blekh May 16 '14 at 18:22
  • 1
    One of the risks w/ using `load` is that you'll bomb any existing objects in your current environment with the same names -- not to mention that the slightest typo in `rm` will kill things you wanted to keep :-( -- which is why I use this function w/ its own distinct environment. – Carl Witthoft May 16 '14 at 18:28
  • I'm pretty confident that for this application the names I assign will be unique, since they come from configuration file I maintain. However, I understand your point and appreciate the clarification. Learning something new every day... By the way, while researching the topic on SO before posting my question, I ran across this relevant and interesting post, which I hope you'll enjoy reading: http://www.cybaea.net/Blogs/A-warning-on-the-R-save-format.html. – Aleksandr Blekh May 16 '14 at 18:56
1

And there is a problem with the way you are using load(). When you use save/load you can "freeze-dry" multiple objects to an .RData file. They "re-infalte" into the current environemnt. As a result, when you call load(), it does not return the object(s), it returns a character vector with the names of all the objects that it restored. Since you didn't supply your save() code, i'm not sure what's actually in your load file, but if it was a variable called data, then just call

load(rdataFile)

not

data <- load(rdataFile)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you for your answer! I updated my question per your answer to clarify things (please see UPDATE 2). As you see, I load only one object per data file, so I don't expect any conflicts. But, please do take a look and comment, as I don't have much experience with R objects manipulation (and R, in general, for that matter). – Aleksandr Blekh May 16 '14 at 19:05