21

for starters: I searched for hours on this problem by now - so if the answer should be trivial, please forgive me...

What I want to do is delete a row (no. 101) from a data.frame. It contains test data and should not appear in my analyses. My problem is: Whenever I subset from the data.frame, the attributes (esp. comments) are lost.

str(x)
# x has comments for each variable
x <- x[1:100,]
str(x)
# now x has lost all comments

It is well documented that subsetting will drop all attributes - so far, it's perfectly clear. The manual (e.g. http://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html) even suggests a way to preserve the attributes:

## keeping special attributes: use a class with a
## "as.data.frame" and "[" method:


as.data.frame.avector <- as.data.frame.vector

`[.avector` <- function(x,i,...) {
  r <- NextMethod("[")
  mostattributes(r) <- attributes(x)
  r
}

d <- data.frame(i= 0:7, f= gl(2,4),
                u= structure(11:18, unit = "kg", class="avector"))
str(d[2:4, -1]) # 'u' keeps its "unit"

I am not yet so far into R to understand what exactly happens here. However, simply running these lines (except the last three) does not change the behavior of my subsetting. Using the command subset() with an appropriate vector (100-times TRUE + 1 FALSE) gives me the same result. And simply storing the attributes to a variable and restoring it after the subset, does not work, either.

# Does not work...
tmp <- attributes(x)
x <- x[1:100,]
attributes(x) <- tmp

Of course, I could write all comments to a vector (var=>comment), subset and write them back using a loop - but that does not seem a well-founded solution. And I am quite sure I will encounter datasets with other relevant attributes in future analyses.

So this is where my efforts in stackoverflow, Google, and brain power got stuck. I would very much appreciate if anyone could help me out with a hint. Thanks!

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
BurninLeo
  • 4,240
  • 4
  • 39
  • 56
  • 1
    One could also set the row NA: x[101,]<-NA. But this is just another pseudo-solution that does not solve the problem. – BurninLeo May 01 '12 at 21:10

4 Answers4

12

If I understand you correctly, you have some data in a data.frame, and the columns of the data.frame have comments associated with them. Perhaps something like the following?

set.seed(1)

mydf<-data.frame(aa=rpois(100,4),bb=sample(LETTERS[1:5],
  100,replace=TRUE))

comment(mydf$aa)<-"Don't drop me!"
comment(mydf$bb)<-"Me either!"

So this would give you something like

> str(mydf)
'data.frame':   100 obs. of  2 variables:
 $ aa: atomic  3 3 4 7 2 7 7 5 5 1 ...
  ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2 2 5 4 2 1 3 5 3 ...
  ..- attr(*, "comment")= chr "Me either!"

And when you subset this, the comments are dropped:

> str(mydf[1:2,]) # comment dropped.
'data.frame':   2 obs. of  2 variables:
 $ aa: num  3 3
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2

To preserve the comments, define the function [.avector, as you did above (from the documentation) then add the appropriate class attributes to each of the columns in your data.frame (EDIT: to keep the factor levels of bb, add "factor" to the class of bb.):

mydf$aa<-structure(mydf$aa, class="avector")
mydf$bb<-structure(mydf$bb, class=c("avector","factor"))

So that the comments are preserved:

> str(mydf[1:2,])
'data.frame':   2 obs. of  2 variables:
 $ aa:Class 'avector'  atomic [1:2] 3 3
  .. ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2
  ..- attr(*, "comment")= chr "Me either!"

EDIT:

If there are many columns in your data.frame that have attributes you want to preserve, you could use lapply (EDITED to include original column class):

mydf2 <- data.frame( lapply( mydf, function(x) {
  structure( x, class = c("avector", class(x) ) )
} ) )

However, this drops comments associated with the data.frame itself (such as comment(mydf)<-"I'm a data.frame"), so if you have any, assign them to the new data.frame:

comment(mydf2)<-comment(mydf)

And then you have

> str(mydf2[1:2,])
'data.frame':   2 obs. of  2 variables:
 $ aa:Classes 'avector', 'numeric'  atomic [1:2] 3 3
  .. ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2
  ..- attr(*, "comment")= chr "Me either!"
 - attr(*, "comment")= chr "I'm a data.frame"
BenBarnes
  • 19,114
  • 6
  • 56
  • 74
  • Hi BenBarnes! Thank your for this answer - given your explanations and the code example, the function from the manual finally makes sense to me! Seem I have to learn a bit about classes in R. – BurninLeo May 02 '12 at 18:41
  • I'm trying to use this approach. However, this operation `transformColumn <- as.numeric(unlist(data["Registration Time"]))` results in the following error message: `"Error in \`[.data.frame\`(data, "Registration Time") :\n undefined columns selected \n Calls: lapply ... do.call -> -> unlist -> [ -> [.data.frame`" (I added '\n' chars for readability). What am I doing wrong? – Aleksandr Blekh Jun 01 '14 at 11:08
  • Sorry for confusion - I think that it does work. Well..., up to a point, where I probably break things in my code. Will report, when I figure this out. – Aleksandr Blekh Jun 01 '14 at 11:19
  • Actually, the error is still there, but a different one: `Error in storage.mode(unlist(data["Registration Time"])) <- "numeric" : could not find function "unlist<-"` (where `data` is a data frame). Does it mean I can't use `unlist()` in a LHS expression? – Aleksandr Blekh Jun 01 '14 at 13:48
  • 1
    @AleksandrBlekh, your comments include code not mentioned in either the OP or the answers. As such, more information, including a minimal reproducible example, would get you the best help. Please consider posting a new question. – BenBarnes Jun 01 '14 at 16:21
  • You're right! I will post a separate question later today and let you know the URL in my comment here. Thank you! – Aleksandr Blekh Jun 01 '14 at 17:30
  • Finally, I was able to put together the question (for now without a full reproducible example, it's partial). Please take a look and let me know, if you still need a *full* reproducible example in order to answer my question: http://stackoverflow.com/questions/23991060/loss-of-attributes-despite-attempts-to-preserve-them. – Aleksandr Blekh Jun 02 '14 at 09:20
5

For those who look for the "all-in" solution based on BenBarnes explanation: Here it is.

(give the your "up" to the post from BenBarnes if this is working for you)

# Define the avector-subselection method (from the manual)
as.data.frame.avector <- as.data.frame.vector
`[.avector` <- function(x,i,...) {
  r <- NextMethod("[")
  mostattributes(r) <- attributes(x)
  r
}

# Assign each column in the data.frame the (additional) class avector
# Note that this will "lose" the data.frame's attributes, therefore write to a copy
df2 <- data.frame(
  lapply(df, function(x) {
    structure( x, class = c("avector", class(x) ) )
  } )
)

# Finally copy the attribute for the original data.frame if necessary
mostattributes(df2) <- attributes(df)

# Now subselects work without losing attributes :)
df2 <- df2[1:100,]
str(df2)

The good thing: When attached the class to all the data.frame's element once, the subselects never again bother attributes.

Okay - sometimes I am stunned how complicated it is to do the most simple operations in R. But I surely did not learn about the "classes" feature if I just marked and deleted the case in SPSS ;)

BurninLeo
  • 4,240
  • 4
  • 39
  • 56
  • I've tried this solution (http://stackoverflow.com/questions/23991060/loss-of-attributes-despite-attempts-to-preserve-them), but one of the issues that I've had is that each subsequent runs of the code adds `avector` class to the object. So, I end up with multiple `avector` class attributes that are redundant. Also, `i` parameter in the selector function definition is unused and, thus, IMHO, can be removed. – Aleksandr Blekh Jun 03 '14 at 10:16
  • I use this code in my read/import script, and save the dataset then. So the code is only run once, per dataframe. – BurninLeo Jun 04 '14 at 13:48
  • I see. I've already figured out most issues with the above-mentioned question. But, regardless, thank you for the reply. – Aleksandr Blekh Jun 04 '14 at 17:18
  • An similar/alternative implementation is provided by the `sticky` package. See my answer elsewhere under this question for an example. – ctbrown Oct 19 '16 at 13:48
3

This is solved by the sticky package. (Full Disclosure: I am the package author.) Apply the sticky() to your vectors and the attributes are preserved through subset operations. For example:

> df <- data.frame( 
+   sticky   = sticky( structure(1:5, comment="sticky attribute") ),
+   nonstick = structure( letters[1:5], comment="non-sticky attribute" )
+ )
> 
> comment(df[1:3, "nonstick"])
NULL
> comment(df[1:3, "sticky"])
[1] "sticky attribute"

This works for any attribute and not only comment.

See the sticky package for details:

ctbrown
  • 2,271
  • 17
  • 24
  • 1
    Good to know that there's such a package. Do you really have to run `sticky()` on every variable to make it's attributes sticky? No offense intended, but the solution from @BenBarnes, which also preserves attributes, takes care of the whole data.frame in one step (which is what I usually need). – BurninLeo Oct 20 '16 at 12:57
  • 1
    I am happy to make that addition to the sticky packages. See: https://github.com/decisionpatterns/sticky/issues/1 – ctbrown Oct 20 '16 at 13:11
  • 1
    @ctbrown, it looks like you resolved that issue. Can you update the solution above to reflect that? – pdb Apr 06 '17 at 05:50
0

I spent hours trying to figure out how to retain attribute data (specifically variable labels) when subsetting a dataframe (removing columns). The answer was so simple, I couldn't believe it. Just use the function spss.get from the Hmisc package, and then no matter how you subset, the variable labels are retained.

Ty Beal
  • 1
  • 1