56

Is there a way to create a "dictionary" in R, such that it has pairs? Something to the effect of:

x=dictionary(c("Hi","Why","water") , c(1,5,4))
x["Why"]=5

I'm asking this because I am actually looking for a two categorial variables function.

So that if x=dictionary(c("a","b"),c(5,2))

     x  val
1    a  5 
2    b  2 

I want to compute x1^2+x2 on all combinations of x keys

     x1 x2 val1  val2  x1^2+x2
1    a  a   5     5      30
2    b  a   2     5      9
3    a  b   5     2      27
4    b  b   2     2      6

And then I want to be able to retrieve the result using x1 and x2. Something to the effect of: get_result["b","a"] = 9

what is the best, efficient way to do this?

acylam
  • 18,231
  • 5
  • 36
  • 45
eran
  • 14,496
  • 34
  • 98
  • 144

6 Answers6

64

I know three R packages for dictionaries: hash, hashmap, and dict.

Update July 2018: a new one, container.

Update September 2018: a new one, collections

hash

Keys must be character strings. A value can be any R object.

library(hash)
## hash-2.2.6 provided by Decision Patterns
h <- hash() 
# set values
h[["1"]] <- 42
h[["foo"]] <- "bar"
h[["4"]] <- list(a=1, b=2)
# get values
h[["1"]]
## [1] 42
h[["4"]]
## $a
## [1] 1
## 
## $b
## [1] 2
h[c("1", "foo")]
## <hash> containing 2 key-value pair(s).
##   1 : 42
##   foo : bar
h[["key not here"]]
## NULL

To get keys:

keys(h)
## [1] "1"   "4"   "foo"

To get values:

values(h)
## $`1`
## [1] 42
## 
## $`4`
## $`4`$a
## [1] 1
## 
## $`4`$b
## [1] 2
## 
## 
## $foo
## [1] "bar"

The print instance:

h
## <hash> containing 3 key-value pair(s).
##   1 : 42
##   4 : 1 2
##   foo : bar

The values function accepts the arguments of sapply:

values(h, USE.NAMES=FALSE)
## [[1]]
## [1] 42
## 
## [[2]]
## [[2]]$a
## [1] 1
## 
## [[2]]$b
## [1] 2
## 
## 
## [[3]]
## [1] "bar"
values(h, keys="4")
##   4
## a 1
## b 2
values(h, keys="4", simplify=FALSE)
## $`4`
## $`4`$a
## [1] 1
## 
## $`4`$b
## [1] 2

hashmap

See https://cran.r-project.org/web/packages/hashmap/README.html.

hashmap does not offer the flexibility to store arbitrary types of objects.

Keys and values are restricted to "scalar" objects (length-one character, numeric, etc.). The values must be of the same type.

library(hashmap)
H <- hashmap(c("a", "b"), rnorm(2))
H[["a"]]
## [1] 0.1549271
H[[c("a","b")]]
## [1]  0.1549271 -0.1222048
H[[1]] <- 9

Beautiful print instance:

H
## ## (character) => (numeric)  
## ##         [1] => [+9.000000]
## ##         [b] => [-0.122205]
## ##         [a] => [+0.154927]

Errors:

H[[2]] <- "Z"
## Error in x$`[[<-`(i, value): Not compatible with requested type: [type=character; target=double].
H[[2]] <- c(1,3)
## Warning in x$`[[<-`(i, value): length(keys) != length(values)!

dict

Currently available only on Github: https://github.com/mkuhn/dict

Strengths: arbitrary keys and values, and fast.

library(dict)
d <- dict()
d[[1]] <- 42
d[[c(2, 3)]] <- "Hello!" # c(2,3) is the key
d[["foo"]] <- "bar"
d[[4]] <- list(a=1, b=2)
d[[1]]
## [1] 42
d[[c(2, 3)]]
## [1] "Hello!"
d[[4]]
## $a
## [1] 1
## 
## $b
## [1] 2

Accessing to a non-existing key throws an error:

d[["not here"]]
## Error in d$get_or_stop(key): Key error: [1] "not here"

But there is a nice feature to deal with that:

d$get("not here", "default value for missing key")
## [1] "default value for missing key"

Get keys:

d$keys()
## [[1]]
## [1] 4
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] 2 3
## 
## [[4]]
## [1] "foo"

Get values:

d$values()
## [[1]]
## [1] 42
## 
## [[2]]
## [1] "Hello!"
## 
## [[3]]
## [1] "bar"
## 
## [[4]]
## [[4]]$a
## [1] 1
## 
## [[4]]$b
## [1] 2

Get items:

d$items()
## [[1]]
## [[1]]$key
## [1] 4
## 
## [[1]]$value
## [[1]]$value$a
## [1] 1
## 
## [[1]]$value$b
## [1] 2
## 
## 
## 
## [[2]]
## [[2]]$key
## [1] 1
## 
## [[2]]$value
## [1] 42
## 
## 
## [[3]]
## [[3]]$key
## [1] 2 3
## 
## [[3]]$value
## [1] "Hello!"
## 
## 
## [[4]]
## [[4]]$key
## [1] "foo"
## 
## [[4]]$value
## [1] "bar"

No print instance.

The package also provides the function numvecdict to deal with a dictionary in which numbers and strings (including vectors of each) can be used as keys, and that can only store vectors of numbers.

Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
  • 1
    Is there anything wrong with [hatmatrix's lists, matrices, etc. answer](https://stackoverflow.com/a/7820032/1048186)? If lists have the same performance as these solutions, they seem like they would be the most universally-recognized way to go. – Josiah Yoder Jul 01 '21 at 16:27
  • From [this answer to another question](https://stackoverflow.com/a/34233870/1048186), it appears that R environments may be faster than both lists and hashes (for random non-vectorized access). – Josiah Yoder Jul 01 '21 at 16:36
  • this is a sad state of affairs in the language; also thank you @stephanelaurent for your very thorough post (and other misc blog posts on R) – Matt Dec 15 '22 at 13:56
27

You simply create a vector with your key value pairs.

animal_sounds <- c(
  'cat' = 'meow',
  'dog' = 'woof',
  'cow' = 'moo'
)
print(animal_sounds['cat'])
# 'meow'

Update: To answer the 2nd portion of the question, you can create a dataframe and compute the values like this:

val1 <- c(5,2,5,2) # Create val1 column
val2 <- c(5,5,2,2) # Create val2 column
df <- data.frame(val1, val2) # create dataframe variable
df['x1^2+x2'] <- val1^2 + val2 # create expression column

Output:

  val1 val2 x1^2+x2
1    5    5      30
2    2    5       9
3    5    2      27
4    2    2       6
toujames
  • 395
  • 5
  • 7
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30756118) – FKayani Jan 11 '22 at 11:43
  • if you use a named list instead of a named vector you can get nesting; e.g.: `dict <- list(cat = 'meow', canine = list(dog = 'bark', wolf = 'howl'))`. Dog can be indexed with `dict$canine$dog`. It's not super pretty but neither is r – Matt Dec 15 '22 at 14:13
  • @FKayani it answers the first part of the question. The question about computation of x1^2+x2 isn't. – jrosell Dec 20 '22 at 11:41
13

You can use just data.frame and row.names to do this:

x=data.frame(row.names=c("Hi","Why","water") , val=c(1,5,4))
x["Why",]
[1] 5
xm1
  • 1,663
  • 1
  • 17
  • 28
  • I'm not sure if this allows you to have nested dictionaries (which is a useful aspect of dictionaries) – Matt Dec 15 '22 at 14:03
6

In that vectors, matrices, lists, etc. behave as "dictionaries" in R, you can do something like the following:

> (x <- structure(c(5,2),names=c("a","b"))) ## "dictionary"
a b 
5 2 
> (result <- outer(x,x,function(x1,x2) x1^2+x2))
   a  b
a 30 27
b  9  6
> result["b","a"]
[1] 9

If you wanted a table as you've shown in your example, just reshape your array...

> library(reshape)
> (dfr <- melt(result,varnames=c("x1","x2")))
  x1 x2 value
1  a  a    30
2  b  a     9
3  a  b    27
4  b  b     6
> transform(dfr,val1=x[x1],val2=x[x2])
  x1 x2 value val1 val2
1  a  a    30    5    5
2  b  a     9    2    5
3  a  b    27    5    2
4  b  b     6    2    2
hatmatrix
  • 42,883
  • 45
  • 137
  • 231
1

See my answer to a very recent question. In essence, you use environments for this type of functionality.

For the higher dimensional case, you may be better off using an array (twodimensional) if you want the easy syntax for retrieving the result (you can name the rows and columns). As an alternative,you can paste together the two keys with a separator that doesn't occur in them, and then use that as a unique identifier.

To be specific, something like this:

tmp<-data.frame(x=c("a", "b"), val=c(5,2))
tmp2<-outer(seq(nrow(tmp)), seq(nrow(tmp)), function(lhs, rhs){tmp$val[lhs] + tmp$val[rhs]})
dimnames(tmp2)<-list(tmp$x, tmp$x)
tmp2
tmp2["a", "b"]
Community
  • 1
  • 1
Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57
-4

Using tidyverse

Adding an answer using more recent tidyverse approaches.

There are probably cleaner ways of handling the crossing (which creates all combinations) and unnesting, but this is a quick and dirty approach.

library(tidyverse)

my_tbl <- tibble(x = c("A", "B"), val=c(5,2)) %>% 
  crossing(x1 = ., x2 = .) %>%  # Create all combinations
  unnest_wider(everything(), names_sep="_") %>% # Unpack into distinct columns
  mutate(result = x1_val^2 + x2_val)  # Calculate result

# Access result by accessing the row in the data frame
my_tbl %>% 
  filter(x1_x == "A", x2_x == "B") %>% 
  pull(result)
#> [1] 27

# Convert tibble to a named vector that could be accessed more easily.
# However, this is limited to string names.
my_named_vector <- my_tbl %>% 
  transmute(name = str_c(x1_x, "_", x2_x), value=result) %>% 
  deframe()

my_named_vector[["A_B"]]
#> [1] 27

Created on 2022-04-06 by the reprex package (v2.0.1)

tibble version 3.1.6
dplyr version 1.0.8
tidyr version 1.2.0
stringr version 1.4.0

Or Gadish
  • 128
  • 5