0

I have a set of vectors containing category values, lets call them, C1, C2,...and I have a frequency vector called Fr. All vectors are of the same length. I want to divide the frequency values in Fr by sums dependent on the categories. In Python using numpy this is fairly easy.

# Find unique categories
unqC1 = np.unique(C1)
unqC2 = np.unique(C2)
# For each unique category in C1 and C2 sum frequencies and normalize 
for uC1 in unqC1:
    for uC2 in unqC2:
        mask = (uC1 == C1) & (uC2 == C2)
        nrmFactor = np.sum(Fr[mask])
        Fr[mask] /= nrmFactor

How can I do this in R? For simplicity lets say I have a table X, in R, with the columns X$Fr, X$C1 and X$C2.

Reed Richards
  • 4,178
  • 8
  • 41
  • 55
  • 1
    please post the input data and the expected output – grrgrrbla Apr 30 '15 at 17:00
  • It will be easier for others to help you if you provide some sample data with expected results. Nevertheless, lookup `dplyr` and checkout the vignettes and I think you will find some examples of what you're after through the use of `group_by` and `summarise` – JasonAizkalns Apr 30 '15 at 17:01

1 Answers1

0

I'm not totally sure, but see if this accomplished the goal:

X$nrmFactor <- ave(X$Fr, X$C1, X$C2, FUN=function(x) sum(x)/length(x))

The ave function calculates a value for every case within categories defined by the second argument and any further arguments before the FUN. This implementation will give you an average. Since the default function for ave is mean, i.e. an (ave)rage, you could have omitted the FUN argument.

IRTFM
  • 258,963
  • 21
  • 364
  • 487