12

I have a data frame of two columns: key and value and I would like to create a dictionary using the respective row of each column for each element of the dictionary / hash table.

As far as I understand the typical way of using R dictionaries / hash tables is by doing something similar to this.

labels.dic <- c("Id of the item and some other description" = "id")

This works perfectly fine but when I try to do it using the values from the data frame (named lbls in the example) it does not work. Why does this happen?

labels.dic <- c(lbls[1,1]=lbls[1,2])
Error: unexpected '=' in "c(lbls[1,1] ="
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
pedrosaurio
  • 4,708
  • 11
  • 39
  • 53
  • R doesn't do dictionaries, you're trying to use a language where it's not designed to be used like a carpenter seen trying to use a screwdriver to dig a hole in the ground. Sure you can contort yourself and work really hard to do something similar, but people are just going to kind of look at you funny. R isn't designed for such types of iterative data manipulation. – Eric Leschinski Mar 04 '17 at 14:45
  • Yep R data structure is limited and this is a serious problem https://www.refsmmat.com/posts/2016-09-12-r-lists.html python/julia is a lot more pleasant (and faster!) to work with. – gagarine Jul 02 '18 at 12:09
  • 1
    What are named lists, if not dictionaries? – Marc Dec 30 '21 at 00:28
  • @Marc Years later I came to this realization too – pedrosaurio Dec 30 '21 at 13:52

4 Answers4

13

It appears to me you've gotten some misinformation. I'm not even certain where you get the idea of that syntax for creating a hashtable.

In any case: for hashtable-like functionality, you may want to consider using an environment: these work internally with a hashtable (if I remember correctly), so do quite what you want to.

You would use this something like:

someenv<-new.env()
someenv[["key"]]<-value

Given your data.frame, something like this would fill it up:

for(i in seq(nrow(lbls)))
{
  someenv[[ lbls[i,1] ]]<- lbls[i,2]
}

(note: this requires that the first column is an actual character column, not a factor!!)

You can then easily get to a named value by using someenv[["nameofinterest"]].

Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57
  • Nick [here](http://tolstoy.newcastle.edu.au/R/help/06/02/20391.html) is where I saw this notation. I've successfully filled my dictionary / hash table with the notation dictionary[[key]] <- value. Still I don't know why this works one way and not the other. Thanks for your help. – pedrosaurio Oct 18 '11 at 09:47
  • OK, I see what you mean. I was put off by you using only 1 key/value pair in your example. Still: environments are supposed to have better performance at this sort of thing. If performance isn't an issue, a named vector (like @kohske suggested) or a list will do just fine. – Nick Sabbe Oct 18 '11 at 11:03
  • 1
    @pedrosaurio - Yeah, environments ARE magnitudes faster at this when you have several 1000 entries. `new.env(hash=TRUE)` is needed in R 2.12 and earlier (they changed to hash=TRUE in 2.13). – Tommy Oct 18 '11 at 15:25
3

The easiest way is to change names after creating variables. So you can define a function like this:

cc <- function(name, value) {
    ret <- c(value)
    names(ret) <- name
    ret
}

cc(c(letters[1:2], "a name"), c(LETTERS[1:2], "a value"))

# output like this
#    a         b    a name 
#   "A"       "B" "a value" 
kohske
  • 65,572
  • 8
  • 165
  • 155
3

Another option that is similar to what you've seen with Python or Perl is the hash package. See: http://cran.r-project.org/web/packages/hash/

If your keys are particularly long, then I recommend storing two hash tables. First, hash the key, using the digest package and store a dictionary (hash table) that maps from digest to key (mapping from key to digest is already done by the digest package ;-)), and then from the digest to the value that you wish to store. This works very well for me.

Iterator
  • 20,250
  • 12
  • 75
  • 111
0

I had a similar problem where I had a dataframe with lots of columns and one of the columns had about 95 different values. I wanted to create another column that grouped (mapped) the 95 values into something more manageable. I created a simple dataframe that had the mappings to columns as a lookup table.

I needed two libraries to do this in simple steps:

library(hash)
library(qdapTools)

Load in a simple dataframe with the two columns that you want to be the hash table:

product_mappings = google_data[,c(1,2)]

In this dataframe column 1 will be the key and column 2 is the lookup value.

# make the hash table
h = hash::hash(keys = product_mappings$col1, values = product_mappings$col2)

# create the column prod_mappings
# lookup the prod_interest in the large df
# find the match in the hash table h and return the value column (col2 in the hash table)
df$prod_mappings = hash_look(df$product_interest, h, missing = df$prod_interest)
Bryan Butler
  • 1,750
  • 1
  • 19
  • 19