2

After loading the network package, I have an issue with the summary.data.frame function: if a column of class "character" is present, instead of the usual output, summary will print the values from all rows, prepended by NULL:. Here's a toy example:

test <- data.frame(a=c("some", "char", "vector", "with", 
                       "many", "many", "words"),
                   b=1:7, stringsAsFactors = FALSE)

# Expected behaviour

summary(test$a)

##    Length     Class      Mode 
##         7 character character

summary(test)

##       a                   b      
##  Length:7           Min.   :1.0  
##  Class :character   1st Qu.:2.5  
##  Mode  :character   Median :4.0  
##                     Mean   :4.0  
##                     3rd Qu.:5.5  
##                     Max.   :7.0

library("network")

## network: Classes for Relational Data
## Version 1.13.0 created on 2015-08-31.
## ...

# Behavior after loading network:

summary(test$a)

##   char   many   some vector   with  words 
##      1      2      1      1      1      1

summary(test)

##     a                b      
##  NULL:some     Min.   :1.0  
##  NULL:char     1st Qu.:2.5  
##  NULL:vector   Median :4.0  
##  NULL:with     Mean   :4.0  
##  NULL:many     3rd Qu.:5.5  
##  NULL:many     Max.   :7.0  
##  NULL:words

Note that the output includes all elements of the character vector, including repetitions, so you get 1000 lines of summary for 1000 rows, which renders the summary function unusable. This behavior stays after detaching the network package, until restart of a new R session.

What goes wrong: normally UseMethod("summary") for character vectors calls summary.default, which produces the normal output, which has names.

summary.default(test$a)

##    Length     Class      Mode 
##         7 character character

names(summary.default(test$a))

## [1] "Length" "Class"  "Mode"

The network package defines a summary.character function, which simply adds a "summary.character" class to the character object, such that its print calls network::print.summary.character, which produces the table with up to 10 most frequent values. The object itself is unchanged, so its names is NULL.

summary.character

## function (object, ...) 
## {
##     class(object) <- c("summary.character", class(object))
##     object
## }
## <environment: namespace:network>

summary.character(test$a)

##   char   many   some vector   with  words 
##      1      2      1      1      1      1

names(summary.character(test$a))

## NULL

class(summary.character(test$a))

## [1] "summary.character" "character"

length(summary.character(test$a))

## [1] 7

as.character(summary.character(test$a))

## [1] "some"   "char"   "vector" "with"   "many"   "many"   "words"

The trouble comes from these three lines in summary.data.frame:

        sms <- format(sms, digits = digits)
        lbs <- format(names(sms))
        sms <- paste0(lbs, ":", sms, "  ")

It's inside a for loop over columns, where sms is the output of summary for the current column. For the output of summary.character, sms is actually the whole column, and names(sms) is NULL, hence the issue.

The core cause of the problem is that summary.character returns the original object, instead of its summary representation, which is delegated to print.summary.character. summary.data.frame just pastes it with the other summaries, dumping the whole column.

Any idea on how to fix this without diving into the sources of network would be very appreciated.

ggll
  • 963
  • 10
  • 13
  • 2
    I am not getting the same output with `summary(test$a)` using `R 3.4.0` – akrun Jun 09 '17 at 20:36
  • 1
    Have you tried restarting R and cleaning out your environment as in `rm(list=ls())`, then cut/paste from your SO post ? I am just wondering if there is something else going on. I don't get your result and I am using 3.3.2 on OS X. – steveb Jun 09 '17 at 20:46
  • 1
    @steveb indeed, if I restart with `R --no-restore-data` the issue disappears – ggll Jun 09 '17 at 20:55
  • 1
    @ggll You have at least two options, start with a clean environment or restart R with the "restored" environment and debug. If you choose the latter option, you may want to use `str(test)` to see what types are really used and compare between both the "no-restored" and "restored" environments. The things that come to mind are any of the following: (*) the `test` data frames are not really the same between restored/non-restored (*) the `summary` function may have been overridden, or obviously (*) something else that needs more info. – steveb Jun 09 '17 at 21:02
  • If you figure out what caused this, please provide the solution on this post. You might be surprised how many people run into something like this. – steveb Jun 09 '17 at 21:03
  • 1
    @ggll I was able to reproduce your error when using the `network` library. Even when I use `base::summary` or `base::summary.data.frame`. It sounds like the `summary` package is doing something it shouldn't. – steveb Jun 09 '17 at 21:16
  • 1
    @ggll One thought I had is to not "pollute" your space in R, you could call `network` functions without loading the library, as in `network::(...)`. This is NOT ideal, and I am not sure it will work but if it does work, it may be a temporary workaround. – steveb Jun 09 '17 at 21:41

1 Answers1

0

I found a turnaround this, unfortunately it involves "polluting" R namespace a bit more (to cite @steveb's comments), by defining a function format.summary.character that restores the expected behavior of the code inside summary.data.frame. The function is inspired by format.factor:

format.summary.character <- function(x, ...) {
    s <- summary.default(as.character(x), ...)
    format(structure(as.character(s), names = names(s), dim = dim(s), 
                     dimnames = dimnames(s)), ...)
}

After defining this function, the output of summary for character vector is still controlled by summary.character, but the output for summary.data.frame goes back to normal.

summary(test$a) # still calling summary.character

##   char   many   some vector   with  words 
##      1      2      1      1      1      1

summary(test)   # back to normal

##       a                   b      
##  Length:7           Min.   :1.0  
##  Class :character   1st Qu.:2.5  
##  Mode  :character   Median :4.0  
##                     Mean   :4.0  
##                     3rd Qu.:5.5  
##                     Max.   :7.0  
## 
ggll
  • 963
  • 10
  • 13