81

What is difference between dataframe and list in R? Which one should be used when? Which is easier to loop over?

Exact problem: I have to first store 3 string elements like "a", "b", "c". Later for each of these, I need to append 3 more elements; for instance for "a" I have to add "a1", "a2", "a3". Later I have to use nested for loops to access these elements.

So I am confused to use dataframe or list or some other data type, in which I could first store and then append (kind of each column)?

Currently I am getting errors, like "number of items to replace is not a multiple of replacement length"

epo3
  • 2,991
  • 2
  • 33
  • 60
ShazSimple
  • 923
  • 1
  • 8
  • 8
  • 1
    I think these may help you.. http://www.r-tutor.com/r-introduction/data-frame and http://www.r-tutor.com/r-introduction/list – Futuregeek Apr 09 '13 at 12:08
  • Was it really that bad question? I am a newbee in R and coming from java and c#, this scripting language seemed difficult...:( – ShazSimple Apr 09 '13 at 12:18
  • 1
    @ShazSimple The question itself isn't that bad. It's just far too generic. If you want a solution to your specific problem, you'll have to present us with a minimal reproducible example, as explained [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). For that, please make a new question. We can leave this one here as a reference. – Joris Meys Apr 09 '13 at 13:16
  • It still seems a unclear what you are trying to accomplish, although you do give information on how you are trying to accomplish it. In `R`, the best way to go about answering a problem isn't always the way you might think, so more information would be helpful. Welcome to SO. – Jonathan Apr 09 '13 at 13:21
  • @Joris, Thank you for your advice. Will post new question soon.. – ShazSimple Apr 09 '13 at 14:29

2 Answers2

134

The question isn't as stupid as some people think it is. I know plenty of people struggling with that difference, and what to use where. To summarize :

Lists are by far the most flexible data structure in R. They can be seen as a collection of elements without any restriction on the class, length or structure of each element. The only thing you need to take care of, is that you don't give two elements the same name. That might cause a lot of confusion, and R doesn't give errors for that:

> X <- list(a=1,b=2,a=3)
> X$a
[1] 1

Data frames are lists as well, but they have a few restrictions:

  • you can't use the same name for two different variables
  • all elements of a data frame are vectors
  • all elements of a data frame have an equal length.

Due to these restrictions and the resulting two-dimensional structure, data frames can mimick some of the behaviour of matrices. You can select rows and do operations on rows. You can't do that with lists, as a row is undefined there.

All this implies that you should use a data frame for any dataset that fits in that twodimensional structure. Essentially, you use data frames for any dataset where a column coincides with a variable and a row coincides with a single observation in the broad sense of the word. For all other structures, lists are the way to go.

Note that if you want a nested structure, you have to use lists. As elements of a list can be lists themselves, you can create very flexible structured objects.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • 2
    Follow up question : I have three huge data frames and I have to perform a number of uniform functions on them. Should I put them in a list and `lapply` or should I keep them separate? Which one is going to take less toll on my memory and less likely to free my computer? – vagabond Feb 11 '16 at 17:27
  • 1
    @vagabond I should check this, but I would suspect that the bottle neck would be the creation of the modified list by lapply. You can check yourself with Rprofmem and tracemem if you want. – Joris Meys Feb 11 '16 at 19:06
0

Look at the example: If you use apply instead of sapply to get the class -

apply(iris,2,class) #  function elements are rows or columns
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
"character"  "character"  "character"  "character"  "character" 

sapply(iris,class) # function elements are variables
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
"numeric"    "numeric"    "numeric"    "numeric"     "factor"