3

I have a R list of objects which are again lists of various types. I want "cost" value for all objects whose category is "internal". What's a good way of achieving this?

If I had a data frame I'd have done something like

my_dataframe$cost[my_dataframe$category == "internal"]

What's the analogous idiom for a list?

mylist<-list(list(category="internal",cost=2),
list(category="bar",cost=3),list(category="internal",cost=4),
list(category='foo',age=56))

Here I'd want to get c(2,4). Subsetting like this does not work:

mylist[mylist$category == "internal"]

I can do part of this by:

temp<-sapply(mylist,FUN = function(x) x$category=="internal")
mylist[temp]
[[1]]
[[1]]$category
[1] "internal"

[[1]]$cost
[1] 2


[[2]]
[[2]]$category
[1] "internal"

[[2]]$cost
[1] 4

But how do I get just the costs so that I can (say) sum them up etc.? I tried this but does not help much:

unlist(mylist[temp])
  category       cost   category       cost 
"internal"        "2" "internal"        "4" 

Is there a neat, compact idiom for doing what I want?

curious_cat
  • 805
  • 2
  • 9
  • 24

6 Answers6

6

The idiom you are looking for is

sapply(mylist, "[[", "cost")

which returns a list of the extracted vector, should it exist, and NULL if it does not.

[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 4

[[4]]
NULL

If you just want the sum of categories that are internal you can do the following assuming you want a vector.

sum(sapply(mylist[temp], "[[", "cost"))

And if you want a list of the same result you can do

sapply(mylist,function(x) x[x$category == "internal"]$cost)

One of the beautiful, but challenging things about R is that there are so many ways to express the same language.

You might note from the other answers that you can interchange sapply and lapply since lists are just heterogenous vectors, the following will also return 6.

do.call("sum",lapply(mylist, function(x) x[x[["category"]] == "internal"]$cost))
shayaa
  • 2,787
  • 13
  • 19
  • Thanks! That works. Can you explain more how the [[ works? Is that an operator? – curious_cat Aug 15 '16 at 08:21
  • It is one of the 3 extraction operators in R. There are benefits to using each, in general, `[` and `[[` have a similar function on vectors, but work differently on recursive vectors, e.g., lists. Note in addition to the last line of code above, that `do.call("sum",lapply(mylist, function(x) x[x[["category"]] == "internal"]["cost"]))` doesn't work, but `do.call("sum",lapply(mylist, function(x) x[x[["category"]] == "internal"][["cost"]))` does work. – shayaa Aug 15 '16 at 16:38
4

Yet another attempt, this time using ?Filter and a custom function to do the necessary selecting:

sum(sapply(Filter(function(x) x$category=="internal", mylist), `[[`, "cost"))
#[1] 6
thelatemail
  • 91,185
  • 12
  • 128
  • 188
2

Could try something like this. For all sublists, if the category is "internal", get the cost, otherwise return NULL which will be ignored when you unlist the result:

sum(unlist(lapply(mylist, function(x) if(x$category == "internal") x$cost)))
# [1] 6

A safer way is to also check if category exists in the sublist by checking the length of category:

sum(unlist(lapply(mylist, function(x) if(length(x$category) && x$category == "internal") x$cost)))
# [1] 6

This will avoid raising an error if the sublist doesn't contain the category field.

Psidom
  • 209,562
  • 33
  • 339
  • 356
1

The purrr package has some nice utilities for manipulating lists. Here, keep lets you specify a predicate function that returns a Boolean for whether to keep a list element:

library(purrr)

mylist %>% 
    keep(~.x[['category']] == 'internal') %>% 
    # now select the `cost` element of each, and simplify to numeric
    map_dbl('cost') %>% 
    sum()
## [1] 6

The predicate structure with ~ and .x is a shorthand equivalent to

function(x){x[['category']] == 'internal'}
alistaire
  • 42,459
  • 4
  • 77
  • 117
1

I approached your question by rlist package. This method is similar to apurrr package method @alistaire mentioned.

library(rlist); library(dplyr)

mylist %>% 
  list.filter(category=="internal") %>% 
  list.mapv(cost) %>% sum()
    # list.mapv returns each member of a list by an expression to a vector.
cuttlefish44
  • 6,586
  • 2
  • 17
  • 34
0

Here's a dplyr option:

library(dplyr)
bind_rows(mylist) %>% 
  filter(category == 'internal') %>% 
  summarize(total = sum(cost))
# A tibble: 1 x 1
    total
    <dbl>
  1     6
sbha
  • 9,802
  • 2
  • 74
  • 62