3

In R, I am producing a document containing random parts addressed to a list of persons.

However, I would like that the same document, addressed to the same person but generated several times, returns always the same number.

For instance I sample a team number (A or B) and I would like any version of the document for person P to mention, say, A.

I know of set.seed function but this require an integer where I have strings (persons' names). So, is there a clever way to map a string to an integer in my case? Or an other (simpler) solution for generating random documents?

thelatemail
  • 91,185
  • 12
  • 128
  • 188
Arthur
  • 1,208
  • 13
  • 25
  • 3
    I wouldn't know how to do it in R, but it should be quite easy to implement a hashing function to map a string to an integer. For example: http://stackoverflow.com/a/2624210/2947592 – wvdz Mar 03 '16 at 23:06
  • 1
    If in your example you knew all persons upfront, you could create a factor with the names as levels and use the integer representation for setting the seed – talat Mar 03 '16 at 23:15
  • Use a factor (which you may already have since that is the default for character data with read.* functions) and convert with `as.numeric`. – IRTFM Mar 03 '16 at 23:32
  • You could use it's encoding, something like `stri_enc_toutf32(your_string)`. Paste the result together and convert to integer. Optionally mod by a large prime. – Gregor Thomas Mar 03 '16 at 23:38
  • 1
    If you want to go the hashing route (which seems like it could work nicely) you might want to adapt the code in [this answer of mine](http://stackoverflow.com/a/14366546/980833) from a while back. – Josh O'Brien Mar 03 '16 at 23:58
  • The answers with using `factor` do not do what I want because the underlying integer is not uniquely associated to the person's name. If I have in a first version Mr. Aple, Mrs. Benny and Mrs Chant, and that later Mr. Aaron joins and I have to generate a document for him, then all the underlying integers have changed and all the documents for the others are changed as well. – Arthur Mar 06 '16 at 17:32
  • @wvdz if you may transform their comments in answers... I will update them with R code and say if they work. – Arthur Mar 06 '16 at 18:29
  • @Josh O'Brien same remark – Arthur Mar 06 '16 at 18:30

1 Answers1

1

Perhaps conversion of text to hex or bits would help you:

# simple example
x <- charToRaw("Matthew")
y <- rawToBits(x)
packBits(y)
# [1] 4d 61 74 74 68 65 77
rawToChar(packBits(y))
# [1] "Matthew"

# with more data
df <- data.frame(names=c("Matthew M.", "Mark T.", "Luke S.", "John U."), stringsAsFactors = FALSE)
df$Raw <- lapply(df$names, FUN=charToRaw)
df$Bits <- lapply(df$Raw, FUN=rawToBits)

bitsToChar <- function(x) {rawToChar(packBits(x))}
df$Char <- lapply(df$Bits, FUN=bitsToChar)
df$Char
# [[1]]
# [1] "Matthew M."
# 
# [[2]]
# [1] "Mark T."
# 
# [[3]]
# [1] "Luke S."
# 
# [[4]]
# [1] "John U."
Marc in the box
  • 11,769
  • 4
  • 47
  • 97