4

I have a data frame with donations and names of donors.

**donation**              **Donor**
 25.00               Steve Smith
 20.00               Jack Johnson
 50.00               Mary Jackson
  ...                   ...

I'm trying to do some clustering using the pvclust package. Unfortunately the package doesn't seem to take non-numerical data.

> rs1.pv1 <- parPvclust(cl, rs1, nboot=10)
Error in cor(x, method = "pearson", use = use.cor) : 'x' must be numeric

I have two questions.

1) Is there another package or method that would do this better?

2) Is there a way to "normalize" the donor names list? Ie get a list of unique donor names, assign each an id number and then insert the id number into the data frame in place of the character name.

Argalatyr
  • 4,639
  • 3
  • 36
  • 62
screechOwl
  • 27,310
  • 61
  • 158
  • 267
  • 1
    I strongly suspect you **don't** want to convert those names to numeric and feed them to `parPvclust`. Instead, from a quick look at `?parPvclust`, and the example in `?lung`, it looks like you should use the `Donor` column as the rownames attribute, and then remove it from the matrix or data.frame. – Josh O'Brien Nov 18 '11 at 19:01
  • @JoshO'Brien: make this an answer??? – Ben Bolker Nov 18 '11 at 19:06
  • Can you explain in a bit more detail what you're trying to do in this example? e.g., are you trying to come up with clusters of donors with similar donation levels (in which case I would be tempted to use `ave` or `plyr::ddply` to get average donations per donor, *then* cluster them ...) – Ben Bolker Nov 18 '11 at 19:08
  • @BenBolker -- I don't have the time right now, plus it's probably worth waiting for a response from the OP. I just wanted to amplify your and Iselzer's misgivings, before the OP went off and did something possibly nonsensical with that function! – Josh O'Brien Nov 18 '11 at 19:14
  • There are a bunch of other columns to the data (donation event, purpose, fiscal year, etc). I just looking for any unexpected relationships. There's no real master plan. Kind of like graphing data, you never know what you'll find. – screechOwl Nov 18 '11 at 19:23

2 Answers2

5

For number 2:

#If donor is a factor then

as.numeric(donor)

#will transform your factor to numeric.
#If it isn't, tranform it to a factor and the to numeric
as.numeric(as.factor(donor))

However, I'm not sure that transforming the donor list to a numeric and then using cor makes sense at all.

HTH

Luciano Selzer
  • 9,806
  • 3
  • 42
  • 40
2

How about rs1 <- transform(rs1, Donor=as.numeric(factor(Donor))) ? (Warning: I haven't thought about what you're doing enough to know whether that makes sense -- so I'm only answering question #2, not question #1). Typically Donor would already be a factor (this is what e.g. read.table or read.csv would do by default), so the factor() part would be redundant.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453