21

I am trying to load a dataset into R using the data() function. It works fine when I use the dataset name (e.g. data(Titanic) or data("Titanic")). What doesn't work for me is loading a dataset using a variable instead of its name. For example:

# This works fine:
> data(Titanic)

# This works fine as well:
> data("Titanic")

# This doesn't work:
> myvar <- Titanic
> data(myvar)
**Warning message:
In data(myvar) : data set ‘myvar’ not found**

Why is R looking for a dataset named "myvar" since it is not quoted? And since this is the default behavior, isn't there a way to load a dataset stored in a variable?

For the record, what I am trying to do is to create a function that uses the "arules" package and mines association rules using Apriori. Thus, I need to pass the dataset as a parameter to that function.

myfun <- function(mydataset) {
    data(mydataset)    # doesn't work (data set 'mydataset' not found)
    rules <- apriori(mydataset)
}

edit - output of sessionInfo():

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] arules_1.0-14   Matrix_1.0-12   lattice_0.20-15 RPostgreSQL_0.4 DBI_0.2-7      

loaded via a namespace (and not attached):
[1] grid_3.0.0  tools_3.0.0

And the actual errors I am getting (using, for example, a sample dataset "xyz"):

xyz <- data.frame(c(1,2,3))
data(list=xyz)
Warning messages:
1: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
3: In if (name %in% names(rds)) { :
  the condition has length > 1 and only the first element will be used
4: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
5: In if (name %in% names(rds)) { :
  the condition has length > 1 and only the first element will be used
6: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used

...

...

32: In data(list = xyz) :
  c("data set ‘1’ not found", "data set ‘2’ not found", "data set ‘3’ not found")
pazof
  • 944
  • 1
  • 12
  • 26
  • 1
    Note that since you already recognized that either `data("Titanic")` OR `data(Titanic)` work then it shouldn't have been too surprising that `data(myvar)` tries to load a dataset with the name of 'myvar'. – Dason Nov 17 '13 at 04:39
  • 1
    Can you add the output of `sessionInfo()`. The other solutions work so I'm wondering why you're getting errors. The workaround that you have as 'accepted' is far from ideal... – Dason Nov 17 '13 at 04:42
  • 2
    myvar <- "Titanic" ; y <- get(myvar) works (R v3.4.4), your dataset gets stored in variable "y", see 42- answer below. – ytoamn Jun 28 '19 at 06:12

5 Answers5

17

Use the list argument. See ?data.

data(list=myvar)

You'll also need myvar to be a character string.

myvar <- "Titanic"

Note that myvar <- Titanic only worked (I think) because of the lazy loading of the Titanic data set. Most datasets in packages are loaded this way, but for other kinds of data sets, you'd still need the data command.

Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • Tried data(list=myvar), but it produces 32 warnings of the sort "In data(list = myvar) : data set ‘0’ not found". Tried the same with storing another arules dataset ("Groceries") into myvar, and this didn't load at all. ("Error in as.character.default(pattern) : no method for coercing this S4 class to a vector"). Maybe I need to specify some more parameters in data(), apart from list=myvar? – pazof Nov 11 '13 at 18:27
  • 1
    @DWin saw your issue; you need `myvar` to be a character string. – Aaron left Stack Overflow Nov 11 '13 at 19:34
  • Sorry for my late answer. Doesn't work either :( it keeps producing 32 warnings. What is weird is that it produces 32 warnings for every dataset I try with - even small ones with 10 transactions or so. Does the data() function really need to be executed before running apriori? I mean, if I directly run the apriori() function, without running data() first, will the results be wrong or something? – pazof Nov 17 '13 at 03:37
6

Use the variable as character. Otherwise you will be processing the contents of "Titanic" rather than its name. You may also need to use get in order to convert the character value to an object name.

myvar <- 'Titanic'

myfun <- function(mydataset) {
    data(list=mydataset)   
    str(get(mydataset))
}

myfun(myvar)
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Sorry for my late answer. Doesn't work either :( it keeps producing 32 warnings. What is weird is that it produces 32 warnings for every dataset I try with - even small ones with 10 transactions or so. Does the data() function really need to be executed before running apriori? I mean, if I directly run the apriori() function, without running data() first, will the results be wrong or something? – pazof Nov 17 '13 at 03:36
1

If the package has been loaded, you can use the get() function to assign the data set to a local variable:

data_object = get(myvar, asNamespace('<package_name>'))

or simply:

data_object = get(myvar)
jciloa
  • 1,039
  • 1
  • 11
  • 22
-3

I am answering my own question, but I have found the solution at last. Quoting R help:

"Data sets are searched for in all the currently loaded packages, then in the ‘data’ directory (if any) of the current working directory."

Thus, all one has to do is write the dataset in a file and place it into a directory named "data" and located into the working directory.

> write.table(mydataset,file="dataset.csv",sep=",",quote=TRUE,row.names=FALSE)  # I intend to create a csv file, so I use 'sep=","' to separate the entries by a comma, 'quote=TRUE' to quote all the entries, and 'row.names=F to prevent the creation of an extra column containing the row names (which is the default behavior of write.table() )

# Now place the dataset into a "data" directory (either via R or via the operating system, doesn't make any difference):
> dir.create("data")  # create the directory
> file.rename(from="dataset.csv",to="data/dataset.csv")  # move the file

# Now we can finally load the dataset:
> data("mydataset")  # data(mydataset) works as well, but quoted is preferable - less risk of conflict with another object coincidentally named "mydataset" as well
pazof
  • 944
  • 1
  • 12
  • 26
  • Ah, well that explains a lot. Usually people would use `read.csv` in this situation. `data` is usually only used when loading data files from packages, as in the example you gave in your question. In the future, you'll get better answers if you provide a complete reproducible example. – Aaron left Stack Overflow Nov 17 '13 at 14:30
  • Yes, it appears that I had just misunderstood the usage of data() - I thought it was a necessary step prior to mining rules from a dataset. – pazof Nov 17 '13 at 19:12
  • 3
    People should be advised that @pazof didn't really know what he/she was doing and made a bunch of unnecessary mistakes. Furthermore his example of how the error was provoked was incomplete. And then his "answer" is basically wrong. (Just my opinion of course, but I think his giving himself a checkmark may mislead people in the future.) – IRTFM Nov 17 '13 at 23:25
  • 1
    @DWin: It's not "wrong", it's just not the optimal workaround to the problem - of course it isn't, quite the contrary - but still, it works. The checkmark is due to that it is the only solution among the answers that solved the problem. And regarding the unnecessary mistakes you mention, could you please point them out? – pazof Nov 18 '13 at 06:16
-3

Assign_Name <- read.csv(file.choose())

This line of code opens your local machine, just select the data-set you want to load it R environment

  • 2
    Thanks for participating, but I don't think this addresses the question. The question is specifically about how to load data provided in a package using the `data()` function with a variable stored in a `character` string. This answer is about how to load data from a CSV file *without* using the name at all. – Gregor Thomas Feb 21 '19 at 14:39