0

I had reported this to R-core, but they said (without explaining) that this is not a bug in R:

During automatic processing of some data, I came across an empty data set (or similar). Anyway, the hist() function used threw an error which looks like a syntax error to me (I'm an R beginner):

> df <- data.frame(n=c(0))
> str(df)
'data.frame':    1 obs. of  1 variable:
$ n: num 0
> hist(df$n) ### this one works!
> hist(df$n, nclass=nclass.scott)  ### this does not!
Error in if (h > 0) ceiling(diff(range(x))/h) else 1L :
 missing value where TRUE/FALSE needed
> df <- data.frame(n=c(0,1))
> hist(df$n, nclass=nclass.scott) ### this one works

Versions tested: 3.3.1 (linux) and 3.3.3 (Windows)

Without nclass=nclass.scott I don't get an error. I failed to find documentation for this parameter, however; I just found that histograms with this parameter look more appealing to me. With Google I found: "nclass.scott uses Scott's choice for a normal distribution based on the estimate of the standard error, unless that is zero where it returns 1"

I'm also expecting some robustness: In automatic processing you never know how much data a particular set will have, and I would prefer a histogram with a single bar in that case. Also compare these:

> hist(numeric(0))
Error in hist.default(numeric(0)) : invalid number of 'breaks'
> hist(numeric(1))
> hist(numeric(1), nclass=nclass.scott)
Error in if (h > 0) ceiling(diff(range(x))/h) else 1L : missing value where TRUE/FALSE needed
> hist(numeric(0), nclass=nclass.scott)
Error in if (h > 0) ceiling(diff(range(x))/h) else 1L : missing value where TRUE/FALSE needed
U. Windl
  • 3,480
  • 26
  • 54
  • The function nclass.scott() should return something different when length(x) =1, but I don't see much point on making a histogram for such small sample sizes. – Edgar Santos May 18 '17 at 08:10
  • Yes, this is not a bug. `help("nclass.scott")` does not claim that it works if the standard error is not defined. You should also be using the `breaks` parameter of `hist`. If this corner case is important to you, you can do `hist(df$n, breaks= if (length(df$n) == 1L) 1L else nclass.scott)`. – Roland May 18 '17 at 08:40
  • @ed_sans: The mistake is to guess how a function will be used: I was visualizing the results of some automatic tests, where I had two subsets: One with at least partially successful tests, and the other with completely failed tests. As it turned out, the second subset was empty. – U. Windl May 18 '17 at 09:24
  • @roland: Shouldn't `== 1L` be `<= 1L` for completeness? Also, what's the meaning of `L` in `1L`? – U. Windl May 18 '17 at 09:31
  • The `L` forces the number to be an integer - http://stackoverflow.com/questions/24350733/why-would-r-use-the-l-suffix-to-denote-an-integer – Richard Telford May 18 '17 at 10:03
  • @U.Windl If your code passes `NULL` or `numeric(0)`to `hist` you have bigger problems. I would want an error in such a case and `hist` will give you one anyway. – Roland May 18 '17 at 10:32
  • Also, keep in mind, that histograms are a tool for data exploration which is interactive by definition. – Roland May 18 '17 at 10:33
  • @Roland: If you automatically generate a several dozen of different plots from one data file, you want some robustness. Think of generating automated reports with a lot of graphics, and not of a single mathematician exploring data. – U. Windl May 18 '17 at 11:49
  • I understand that. But it's unreasonable to expect R functions developed primarily for interactive use to have that robustness. The average R user wants an error for such corner cases. It you need robustness you are expected to implement it via `if` conditions and error handling (see `tryCatch`). PS: A histogram for one observation is pretty useless. – Roland May 18 '17 at 12:11
  • @Roland: A histogram with one value says: "100% of the samples have that value"; what's wrong with that? – U. Windl Mar 11 '19 at 20:55
  • @U.Windl I didn't say "wrong". I said "useless". If you have one value, show that value. A histogram only adds obfuscation, because instead of showing the exact value, you show a range. – Roland Mar 12 '19 at 06:58

2 Answers2

0

A standard error can not be estimated with only one observation and it returns NA in this case which explains the error message about the missing value.

> sd(0)
[1] NA

> sd(c(1,1))
[1] 0
theSZ
  • 73
  • 7
  • I see, but cannot be the functions be more robust: In automatic processing you never know how much data a particular set will have, and I would prefer an empty histogram in that case. – U. Windl May 18 '17 at 09:43
  • See Roland's comment for a robust solution. – theSZ May 18 '17 at 11:23
  • Roland's solution does not handle the case when the data set is empty. – U. Windl May 18 '17 at 11:46
  • As in your solution `if (length(df$n) > 1L)` handles this for you, but you might want to consider `if (length(df$n) > 0L)` so that `breaks=if (length(df$n) == 1L) 1L else nclass.scott)` is still meaningful? – theSZ May 18 '17 at 12:39
-1

It seems the best solution (as things are now) is (combining Roland's with what I had):

if (length(df$n) > 1L) {
    hist(df$n, breaks=if (length(df$n) == 1L) 1L else nclass.scott)
} # else produce nothing
U. Windl
  • 3,480
  • 26
  • 54