4

I need to test the value of'peso'(see replication code below) for each factor. Whether a factor reaches 50% of the overall sum for 'peso', the values of each factor should be paste into a new object 'results', otherwise, R should evaluate which factor has the lowest aggregated value for 'peso', and consider the factor in the next column for aggregate 'peso' again. Basically, this process replace the lowest scored factor for the next factor. The process should repeat till a factor cross the 50% threshold. So my question is, where do I start?

set.seed(51)
Data <- sapply(1:100, function(x) sample(1:10, size=5))
Data <- data.frame(t(Data))
names(Data) <- letters[1:5]
Data$peso <- sample(0:3.5, 100, rep=TRUE) 

It should be like

If your first two rows are: 
  a  b  c  d  e peso
  8  2  3  7  9    1
  8  3  4  5  7    3
  9  7  4 10  1    2
 10  3  4  5  7    3   

What would you like for the total?  
      Totals_08  = 4
      Totals_09  = 2
      Totals_10  = 3
      etc?

So, factor 8 got the greater share 4/(4+2+3) = 0.4444444, but not reached 50% threshold in the round a. Therefore, I need something more: repeat the aggregation but considering now the factor 7 in the column 'b' instead of factors 9 in the column 'a', since it got the lowest aggregated value in the first round.

daniel
  • 1,186
  • 2
  • 12
  • 21

1 Answers1

1

It's unclear if you have your list of factors already or not. If you do not have it, and are taking it from the data set, you can grab it in a few different ways:

# Get a list of all the factors
myFactors <- levels(Data[[1]])  # If actual factors.
myFactors <-   sort(unique(unlist(Data)))  # Otherwise use similar to this line


Then to calculate the Totals per factor, you can do the following

Totals <- 
 colSums(sapply(myFactors, function(fctr) 
     # calculate totals per fctr
     as.integer(Data$peso) * rowSums(fctr == subset(Data, select= -peso)) 
   ))

names(Totals) <- myFactors

Which gives

Totals
#    1   2   3   4   5   6   7   8   9  10 
#  132 153 142 122 103 135 118 144 148 128 



Next: I'm not sure if afterwards, you want to compare to the sum of peso or the sum of the totals. Here are both options, broken down into steps:
# Calculate the total of all the Totals:
TotalSum <- sum(Totals)

# See percentage for each:
Totals / TotalSum
Totals / sum(as.integer(Data$peso))

# See which, if any, is greater than 50%
Totals / TotalSum > 0.50
Totals / sum(as.integer(Data$peso)) > 0.50

# Using Which to identify the ones you are looking for
which(Totals / TotalSum > 0.50)
which(Totals / sum(as.integer(Data$peso)) > 0.50)



Note on your sampling for Peso

You took a sample of 0:3.5, however, the x:y sequence only gives integers. If you want fractions, you can either use seq() or you can take a larger sequence and then divide appropriately:

option1 <-  (0:7) / 2
option2 <-  seq(from=0, to=3.5, by=0.5)

If you want whole integers from 0:3 and also the value 3.5, then use c()

 option3 <- c(0:3, 3.5)
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • can you elaborate, then, a bit more on what specifically you are looking for? What would you like the output to look like? – Ricardo Saporta Nov 16 '12 at 16:26
  • It is still far from what I'm looking for. I don't know whether it can help, but I'm looking for to implement a simulated election as described here youtube.com/watch?v=3Y3jE3B8HsE so, I'd like to count peso as votes as described in the video. – daniel Nov 17 '12 at 06:18
  • How specifically is it far from you are looking for? Perhaps if you could elaborate more speicifcaly on what you need we can better help. – Ricardo Saporta Nov 17 '12 at 19:41