6

I have a bunch of large dataframes, so every time I want to display them, I have to use head:

head( blahblah(somedata) )

Typing head all the time gets old after the first few hundred times, so I'd like an easy way to do this if possible. One of the cool things about R compared to java that things like this are often really easy, if you know the secret incantation.

I searched in options, and found max.print, which almost works, except there is now a time delay.

head( blahblah(somedata) )

.... is instantaneous (to within the limits of my perception)

options(max.print=100)
blahblah(somedata)

.... takes about 3 seconds, so longer than typing head

Is there some way of making head be applied automatically when printing large data structures?

An piece of code which reproduces this behavior:

long_dataset = data.frame(a = runif(10e5), 
                          b = runif(10e5), 
                          c = runif(10e5))
system.time(head(long_dataset))
options(max.print = 6)
system.time(print(long_dataset))
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
  • 2
    Not sure what is best practice, but I reckon the `max.print` option sounds like a good bet. As an alternative, could you just edit the `print.data.frame` function to make it just print out the first 10 rows? – thelatemail Oct 23 '12 at 05:56
  • 7
    Use data.tables instead of data.frames – mnel Oct 23 '12 at 05:58
  • @mnel, interesting. That sounds like a good method. And it seems to work nicely. It seems like data.tables solve lots of problems in fact? – Hugh Perkins Oct 23 '12 at 06:05
  • @Paul, your addition of an example is very useful. Thanks! – Hugh Perkins Oct 23 '12 at 06:07
  • `data.table` will do this automatically so I agree with mnel on that one. But those are sometimes overkill for mundane R tasks. No need to bring out the ditch witch when a beach showel will do the same job just fine. – Maiasaura Oct 23 '12 at 06:21
  • 2
    In my `.rprofile` I keep `h <- utils::head`. So I save typing by just doing `h(df)`. Does that help? – Maiasaura Oct 23 '12 at 06:22

2 Answers2

7

Putting my comment into an answer, using the data.table package (and data.table not data.frame objects) will automatically print only the first 5 and last 5 rows (once the data.table is larger than 100 rows)

library(data.table)
DT <- data.table(long_data)
DT

      1: 0.19613138 0.88714284 0.25715067
      2: 0.25405787 0.76544909 0.75632468
      3: 0.24841384 0.22095875 0.52588596
      4: 0.72766161 0.79696771 0.88802759
      5: 0.02448372 0.77885568 0.38199993
     ---                                 
 999996: 0.28230967 0.09410921 0.84420162
 999997: 0.73598931 0.86043537 0.30147089
 999998: 0.86314546 0.90334347 0.08545391
 999999: 0.85507851 0.46621131 0.23892566
1000000: 0.33172155 0.43060483 0.44173400

The data.table FAQ 2.11 deals with this explicitly.


EDIT to deal with existing data.frame objects you don't want to convert.

If you were hesitant at converting existing data.frame objects to data.table objects, you could simply define print.data.frame as data.table:::print.data.table

print.data.frame <- data.table:::print.data.table

long_dataset

      1: 0.19613138 0.88714284 0.25715067
      2: 0.25405787 0.76544909 0.75632468
      3: 0.24841384 0.22095875 0.52588596
      4: 0.72766161 0.79696771 0.88802759
      5: 0.02448372 0.77885568 0.38199993
     ---                                 
 999996: 0.28230967 0.09410921 0.84420162
 999997: 0.73598931 0.86043537 0.30147089
 999998: 0.86314546 0.90334347 0.08545391
 999999: 0.85507851 0.46621131 0.23892566
1000000: 0.33172155 0.43060483 0.44173400
mnel
  • 113,303
  • 27
  • 265
  • 254
  • Sounds good. I haven't quite plucked up courage to put my toe into the water with using data.table, but it seems they solve a lot of problems. How widely used is data.table compared to data.frame? (I guess I'm asking so I get a feel for how likely it is to be bug-free, feature-complete, and work effectively with different libraries) – Hugh Perkins Oct 23 '12 at 12:43
  • @HughPerkins [The developer (Matthew Dowle)](http://stackoverflow.com/users/403310/matthew-dowle) is very active in maintaining the package and squashing bugs. The `data.table` package I believe is in general use. It even has [its own tag](http://stackoverflow.com/questions/tagged/data.table). – Blue Magister Oct 24 '12 at 04:34
5

I'd go along with @thelatemail's suggestion, i.e. redefine print.data.frame:

print.data.frame <- function(df) {
   if (nrow(df) > 10) {
      base::print.data.frame(head(df, 5))
      cat("----\n")
      base::print.data.frame(tail(df, 5))
   } else {
      base::print.data.frame(df)
   }
}

data.frame(x=1:100, y=1:100)
#   x y
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
# ----
#       x   y
# 96   96  96
# 97   97  97
# 98   98  98
# 99   99  99
# 100 100 100

A more elaborate version could line everything up together and avoid the repeated header, but you get the idea.

You could put such function in your .Rprofile or Rprofile.site files (see ?Startup) so it will be there every time you start an R session.

flodel
  • 87,577
  • 21
  • 185
  • 223
  • This is a great solution! Especially when combined with .Rprofile. I know I should use data.table sooner or later, but for now all my code is still using data.frames! – Hugh Perkins Oct 24 '12 at 06:41