1

Here is what the original data frame looks like:

         PLACEMENT      SIZE      COST
1        placement1     LARGE    1838128.00
58       placement1     MEDIUM   10962048.00
117      placement1     SMALL    2622851.00
175      placement1     UNKNOWN  443.00
2        placement2     LARGE    598.00
59       placement2     MEDIUM   24358.00
118      placement2     SMALL    571802.00
176      placement2     UNKNOWN  1706.00
3        placement3     LARGE    8.00
60       placement3     MEDIUM   22.00  
119      placement3     SMALL    502388.00
177      placement3     UNKNOWN  762.00

How do you create a column that shows the percentage of SIZE by PLACEMENT?

I want it to look like this in the end:

         PLACEMENT      SIZE      COST           PERCENTAGE
1        placement1     LARGE    1838128.00         11.9
58       placement1     MEDIUM   10962048.00        71.1
117      placement1     SMALL    2622851.00         17.0
175      placement1     UNKNOWN  443.00              0.0 
2        placement2     LARGE    598.00              0.1
59       placement2     MEDIUM   24358.00           4.07
118      placement2     SMALL    571802.00         95.54
176      placement2     UNKNOWN  1706.00            0.29
3        placement3     LARGE    8.00                0.0
60       placement3     MEDIUM   22.00               0.0
119      placement3     SMALL    502388.00         99.84
177      placement3     UNKNOWN  762.00             0.16 

Any help would be great, thanks! I cannot figure it out with the prop.table library even though I have a feeling that is what I am supposed to be using.

nak5120
  • 4,089
  • 4
  • 35
  • 94
  • 2
    Some possible duplicates: http://stackoverflow.com/q/16716269/ and http://stackoverflow.com/q/22231357/ and http://stackoverflow.com/q/25134347/ and http://stackoverflow.com/q/24247462/ and http://stackoverflow.com/q/31394227/ – Frank Mar 28 '16 at 20:37

3 Answers3

2

You can do it quickly with dplyr:

library(dplyr)
df <- df %>% group_by(PLACEMENT) %>% mutate(PERCENTAGE=COST/SUM(COST))

It looks like the results you want are also rounded, which you can do with the function round() if you like.

Edit If you want to keep your percentages measured between 1 and 100, you can of course do that by writing 100*COST/SUM(COST) instead if you prefer it that way.

gfgm
  • 3,627
  • 14
  • 34
1

Assuming your data frame input is DF this will do it. No packages are needed.

transform(DF, PC = 100 * ave(COST, PLACEMENT, FUN = prop.table)) 

giving:

     PLACEMENT    SIZE     COST           PC
1   placement1   LARGE  1838128 11.917733169
58  placement1  MEDIUM 10962048 71.073811535
117 placement1   SMALL  2622851 17.005583050
175 placement1 UNKNOWN      443  0.002872246
2   placement2   LARGE      598  0.099922468
59  placement2  MEDIUM    24358  4.070086087
118 placement2   SMALL   571802 95.544928350
176 placement2 UNKNOWN     1706  0.285063095
3   placement3   LARGE        8  0.001589888
60  placement3  MEDIUM       22  0.004372193
119 placement3   SMALL   502388 99.842601057
177 placement3 UNKNOWN      762  0.151436862

Note: The input in reproducible form is:

Lines <- "PLACEMENT      SIZE      COST
1        placement1     LARGE    1838128.00
58       placement1     MEDIUM   10962048.00
117      placement1     SMALL    2622851.00
175      placement1     UNKNOWN  443.00
2        placement2     LARGE    598.00
59       placement2     MEDIUM   24358.00
118      placement2     SMALL    571802.00
176      placement2     UNKNOWN  1706.00
3        placement3     LARGE    8.00
60       placement3     MEDIUM   22.00  
119      placement3     SMALL    502388.00
177      placement3     UNKNOWN  762.00"

DF <- read.table(text = Lines, header = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

Here is an option using data.table

library(data.table)
setDT(df)[, PERCENTAGE := COST/SUM(COST) ,  by = PLACEMENT]
akrun
  • 874,273
  • 37
  • 540
  • 662