1

Say I have a dataframe with 6 columns and 100000 rows. I want to select rows in matrix originScen based on the indices/numbers in another vector reducedScenIds (10,000 rows). I select the rows by checking if the value of each member of Y matches the value in column 1 of the dataframe X. Now the first column can have multiple matches for each value of Y.

So I used the below

reducedSet <- originScen[which(originScen[,1] %in% reducedScenarioIds),]

I am ok with the results except that which and %in% seems to destroy the order of reducedScenarioIds vector. The final reducedSet has rows selected based on ascending order of ids found in the reducedScenarioIds vector and not the exact same order.

The originScen[,1] can have duplicate entries for each entry in reducedScenarioIds

Anyone have an alternate solution?

Thanks

Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
user2547134
  • 13
  • 1
  • 3

1 Answers1

2

Try this:

reducedSet <- originScen[originScen[,1] %in% reducedScenarioIds,][order(na.exclude(match(originScen[,1], reducedScenarioIds))),]
Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
  • Thanks for the answer! Works perfect. Don't understand how though. So first the reduced set of scenarios are produced with originScen[originScen[,1] %in% reducedScenarioIds,] then the second part would order it in the way the indices appear in reducedScenarioIds. How does the na.exclude figure here? – user2547134 Jul 04 '13 at 00:24
  • I've added `na.exclude` to remove the cases where `originScen[,1]` contains values not present in `reducedScenarioIds`, as the result of `match` in these cases is `NA`. – Ferdinand.kraft Jul 04 '13 at 01:44