-2

I have a function theresults which takes a 71x2446 data frame and returns a 2x2446 double matrix. the first number in each of the 2446 pairs represents an integer 1-6, which is basically what category the line fits into, and the second number is the Profit or Loss on that category. I want to calculate the sum of profits across each category while counting the frequency of each category. My question is if the way I've written it is an efficient use of vectors

  vec<-as.data.frame(t(apply(theData,1,theresults)))
  vec[2][vec[1]==1]->successCrossed
  vec[2][vec[1]==2]->failCrossed
  vec[2][vec[1]==3]->successFilled
  vec[2][vec[1]==4]->failFilled
  vec[2][vec[1]==5]->naCount
  vec[2][vec[1]==6]->otherCount

then there are a bunch of calls to length() and mean() while summarizing the results.

theresults references the original data frame in this sort of way

   theresults<-function(theVector)
  {
       if(theVector[['Aggressor']]=="Y")
       {
      if(theVector[['Side']]=="Sell")
      {choice=6}
      else
     {choice=3}
     if(!is.na(theVector[['TradePrice']])&&!is.na(theVector[['L1_BidPri_1']])&&!is.na(theVector[['L1_AskPri_1']])&&!is.na(theVector[['L2_BidPri_1']])&&!is.na(theVector[['L2_AskPri_1']]))
{
  Profit<-  switch(choice,                           
                  -as.numeric(theVector[['TradePrice']]) + 10000*as.numeric(theVector[['L1_AskPri_1']])/as.numeric(theVector[['L2_BidPri_1']]),
                  -as.numeric(theVector[['TradePrice']]) + 10000*as.numeric(theVector[['L1_BidPri_1']])/as.numeric(theVector[['L2_BidPri_1']]),
hedgedandlevered
  • 2,314
  • 2
  • 25
  • 54
  • It seems very difficult to answer without a sample data and an example of what you want to get in the end... – juba Jan 25 '13 at 21:24
  • I'm just wondering if the way I approached this problem is correct, not a walkthrough. My code runs fine and gives me the result I want, but it also ran fine when I used a bunch of for loops, but thats much slower. – hedgedandlevered Jan 25 '13 at 21:32
  • for example, is there a more efficient way to do this `vec[2][vec[1]==1]->successCrossed vec[2][vec[1]==2]->failCrossed vec[2][vec[1]==3]->successFilled vec[2][vec[1]==4]->failFilled vec[2][vec[1]==5]->naCount vec[2][vec[1]==6]->otherCount` by counting as I walk through the data frame instead of going through it 6 times? – hedgedandlevered Jan 25 '13 at 21:33
  • 1
    Quite impossible for me to answer without having an idea of what `vec` is, sorry. – juba Jan 25 '13 at 21:37
  • please read the description "and returns a 2x2446 double matrix. the first number in each of the 2446 pairs represents an integer 1-6, which is basically what category the line fits into, and the second number is the Profit or Loss on that category." cliffnotes: its a double matrix I immediately convert into a data frame – hedgedandlevered Jan 25 '13 at 21:40
  • 1
    Take a look at [this post](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and see if you can provide some or all of the information suggested there. with regard to your comment, create a data.frame that does that mapping (i.e. 3 maps to 'successFilled`) and use that to lookup: `vec[2] <- df$category[match(vec[1], df$value)]` or something like that. – Justin Jan 25 '13 at 21:56
  • I don't think I would call this "vectorization. You are using an `apply` loop to calculate a by-row set of results. You could probably make more efficient with a few `ifelse` tests and assignments. Furthermore you will probably avoid the implicit coercion of numeric-classed columns to character-class and can then drop the `as.numeric` calls. – IRTFM Jan 25 '13 at 22:29
  • wait... apply isn't a vectorized function?? – hedgedandlevered Jan 28 '13 at 13:42
  • Now someone voted this post down. This is my first post ever, I'd appreciate some more slack... I'm a newbie here If apply isn't vectorized thats exactly the answer I was looking for. I'm not asking for a fix to my code. You don't need to run it, so that reproducible example doesn't apply. I'm just asking for feedback on whether I'm properly vectorizing **because I don't know what functions even vectorize** – hedgedandlevered Jan 28 '13 at 15:22

2 Answers2

0

You can try combining the 2x2446 vector into a string vector representing the type and profit statuses...then calling "table" on it.

Here's an example:

data = cbind(sample(1:6, replace=T, 30),
     sample (c("profit", "loss"), replace=T, 30))

x = apply(data, MARGIN=1, paste, collapse="")

table(x)
kith
  • 5,486
  • 1
  • 21
  • 21
-1

I'm pretty sure that for this type of operation, even if the data set were in the hundreds of thousands of rows, the correct answer would be to use Uwe's maxim; this code is fast enough and will not be a bottleneck in the program. (in response to the other answer, cbind is slow and memory intensive relative to my current solution.)

hedgedandlevered
  • 2,314
  • 2
  • 25
  • 54