1

I've created a table based of counts based on two columns in my data frame. I used:

data.frame(table(df$State,df$Subset)[,])

Is there a way I can turn these counts into percentages for each specific State?

             S1       S2        S3       S4    S5     S6     S7
NY          195     1296       974     5528  3597    505    282
NJ          172      733       763     3253  3088    315    166
CA           48      552      1087     2073  1212   1149    203

So rather than counts, it each would be a percentage of each specific state. All NY would add up to 100% across, same with NJ, CA, etc..

coderX
  • 424
  • 5
  • 16
  • 1
    Possible duplicate of [How to generate a frequency table in R](http://stackoverflow.com/questions/11148868/how-to-generate-a-frequency-table-in-r) –  Dec 09 '15 at 03:48

1 Answers1

3

I don't know if the comment ?prop.table is enough. prop.table by default will only give you the cell-proportions, and what is being requested are row proportions.

tbl <
structure(c(195L, 172L, 48L, 1296L, 733L, 552L, 974L, 763L, 1087L, 
5528L, 3253L, 2073L, 3597L, 3088L, 1212L, 505L, 315L, 1149L, 
282L, 166L, 203L), .Dim = c(3L, 7L), .Dimnames = list(c("NY", 
"NJ", "CA"), c("S1", "S2", "S3", "S4", "S5", "S6", "S7")), class = "table")

Compare these two values:

> prop.table(tbl)
            S1          S2          S3          S4          S5          S6
NY 0.007171491 0.047662830 0.035820676 0.203302563 0.132286418 0.018572322
NJ 0.006325622 0.026957449 0.028060755 0.119635173 0.113566989 0.011584716
CA 0.001765290 0.020300835 0.039976463 0.076238461 0.044573572 0.042256629
            S7
NY 0.010371079
NJ 0.006104961
CA 0.007465706
> prop.table(tbl, margin=1)
            S1          S2          S3          S4          S5          S6
NY 0.015755029 0.104710350 0.078694352 0.446634887 0.290619698 0.040801487
NJ 0.020259128 0.086336867 0.089870436 0.383156655 0.363722026 0.037102473
CA 0.007590133 0.087286528 0.171884883 0.327798861 0.191650854 0.181688805
            S7
NY 0.022784196
NJ 0.019552415
CA 0.032099937

Only the second one is the basis for a "percentage estimate (which does require multiplication by 100):

> 100*prop.table(tbl, margin=1)
           S1         S2         S3         S4         S5         S6         S7
NY  1.5755029 10.4710350  7.8694352 44.6634887 29.0619698  4.0801487  2.2784196
NJ  2.0259128  8.6336867  8.9870436 38.3156655 36.3722026  3.7102473  1.9552415
CA  0.7590133  8.7286528 17.1884883 32.7798861 19.1650854 18.1688805  3.2099937

I think the more useful result would round that set of values to one decimal place.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Was trying to figure out how to use margin, and that helped a lot. Thanks – coderX Dec 09 '15 at 04:02
  • @rawr: I done't really understand your whinge. You get row proportions for margin = 1. `margin.table` is for the marginal sum, not the row proportions. It's essentially `rowSums(tbl)` – IRTFM Dec 09 '15 at 06:59
  • perhaps it is your senility, but my complaints were about the obscure documentation, not whatever you are currently prattling about – rawr Dec 09 '15 at 07:01
  • 1
    @rawr: Well, it you mean that fact that they define it with `sweep(x, margin, margin.table(x, margin), "/")`, and I have never thought that `sweep` was clearly defined, then maybe I agree with you. I'd rather hoped my senility was noticeable until 10 years from now. – IRTFM Dec 09 '15 at 07:16