1

For a research project I have a relatively large block of code that is taking quite a while to run. Need to shorten the time it takes for this program to run, so ran profr to look at what functions are taking up the most time. Thing is, I don't understand the notation. Can someone explain to me, or point me to a resource that explains, what these mean:

[<-.data.frame
[[.data.frame
[<-
[
[.factor
[.data.frame
[<-factor

? I realize they must be some sort of internals in R for creating new and subsetting dataframes, I just don't know which.

Thanks.

Arun
  • 116,683
  • 26
  • 284
  • 387
Mike Flynn
  • 1,025
  • 2
  • 12
  • 29
  • `[<-` is assigment to an indexed value, with the target being something simple such as an item in a vector. `[<-.data.frame` is assignment to a location inside a dataframe. – IRTFM Jul 10 '12 at 19:26

1 Answers1

4

Quoting from the "R for Dummies" cheet sheat:

Subsetting R Objects

Vectors, lists, and data frames play an important role in representing data in R, so being able to succinctly and correctly specify a subset of your data is important.

You can use three operators to subset your data:

  • [[: Extracts a single element by name or position from a list or data frame. For example, iris[["Sepal.Length"]] extracts the column Sepal.Length from the data frame iris; iris[[2]] extracts the second element from iris.

  • [: Extracts multiple elements from a vector, array, list, or data frame. For example, iris[, c("Sepal.Length", "Species")] extracts the columns Sepal.Length and Species from iris; iris[1:10, ] extracts the first ten rows from iris; and iris[1:10, "Species"] extracts the first ten elements of the column Species from iris.


You can find the same information in ?Extract although not as nicely summarised ;-)


My guess is that your profiling problem is with [<- since I know this is a slow operation. You possibly have a loop with multiple [<- column assignments into a data frame. You can make this substantially faster by:

  • Making a single assignment of multiple columns
  • Using the package data.table
Community
  • 1
  • 1
Andrie
  • 176,377
  • 47
  • 447
  • 496