0

I hope someone could help me or at least give me a good advice. I have a large dataframe to store scientific papers (classified by Author/Year/Journal). Most of the scientific papers give me more records, so I am trying to write a function (until now without success) that return me a unique value (named n) that identifies the paper from which the record belongs.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
stefano
  • 601
  • 1
  • 8
  • 14
  • 3
    Stefano, welcome to SO. Please provide us with a reproducible example and try to explain (and show) what you expect your output to look like. You should also show us what you have tried so far. There are a bunch of really good examples of how to do this here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Brandon Bertelsen Dec 28 '12 at 17:43

1 Answers1

2

For calculating unique values, you could use the digest function from the digest package. For example,

library(digest)
digest(c("Granger", "1987", "Econometrica"))

returns a unique MD5 string for a publication. digest is not vector-able, i.e. you have to use sapply or similar to calculate the id for each row of your data frame.

Karsten W.
  • 17,826
  • 11
  • 69
  • 103
  • 1
    or, less robustly, just `paste` together the authors/date/journal to get an ID string. – Ben Bolker Dec 28 '12 at 18:10
  • 2
    you could also use `interaction` to make a unique id for combinations of columns: `with(d, as.numeric(interaction(Author, Year, Journal, drop=TRUE)))` – Matthew Plourde Dec 28 '12 at 18:22
  • Hi everybody. I try the solution proposed by Matthew and it works very well!I supposed I was enough clear in my example, but next time I will provide all the necessary details.I appreciate all the tips! – stefano Dec 28 '12 at 22:18