3

I have a dataframe named commodities_3. It contains 28 columns with different commodities and 403 rows representing end-of-month data. What I need is to find the position for each row separately:

  • max value,
  • min value,
  • all other positives
  • all other negatives

Those index should then be used to locate the corresponding data in another dataframe with the same column and row characteristics called commodities_3_returns. These data should then be copied into 4 new dataframes (one dataframe for each sorting).

I know how to find the positions of the values for each row using which and which.min and which.max. But I don't know how to put this in a loop in order to do it for all 403 rows. And subsequently how to use this data to locate the corresponding data in the other dataframe commodities_3_returns.


Unfortunaltey I have to use a dataframe because I have dates as rownames in there, which I have to keep as I need them later for indexing, as well as NA's. It looks about like this:

commodities_3 <- as.data.frame(matrix(rnorm(15), nrow=5, ncol=3))
mydates <- as.Date(c("2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "2011-01-05"))
rownames(commodities_3) <- mydates
commodities_3[3,2] <- NA


commodities_3_returns <- as.data.frame(matrix(rnorm(15), nrow=5, ncol=3))
mydates <- as.Date(c("2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "2011-01-05"))
rownames(commodities_3_returns) <- mydates
commodities_3_returns[3,3] <- NA

As I said, I have in total 403 rows and 27 columns. In every row, there are some NA's which I have to keep as well. max.col doesn't seem to be able to handle NA's.

My desired output for the above mentioned example would be sth like this:

max_values <- as.data.frame(matrix(data=c(1:5,3,2,1,3,1), nrow=5, ncol=2, byrow=F))
Tim Post
  • 33,371
  • 15
  • 110
  • 174
Sketch
  • 61
  • 3
  • You could start by giving us a reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It would also help us if you showed us how the output should look like. – Roman Luštrik Sep 08 '11 at 09:10
  • I've merged your unregistered account into your registered account. You now have full control over this question. – Tim Post Dec 07 '11 at 05:30

2 Answers2

2

If all the columns in commodities_3 are numeric, then you want a matrix, not a data frame. Then use the apply function. Some sample data, for reprodcubililty.

commodities_3 <- matrix(rnorm(12), nrow = 4)
commodities_3_returns <- matrix(1:12, nrow = 4)

The stats.

mins <- apply(commodities_3, 1, which.min)
maxs <- apply(commodities_3, 1, which.min)
pos <- apply(commodities_3, 1, function(x) which(x > 0))  #which is optional
neg <- apply(commodities_3, 1, function(x) which(x < 0))

Now use these in the index for commodities_3_returns. In the absence of coffee, my brain has only a clunky solution with a for loop

n_months <- nrow(commodities_3_returns)
min_returns <- numeric(n_months)
for(i in seq_len(n_months))
{
  min_returns[i] <- commodities_3_returns[i, mins[i]]
}
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
1

Here is an alternate approach to get the min and max using max.col which is a C function internally. If you have a large data set, max.col works extremely fast compared to apply based solutions

mins = max.col(-commodities_3)
maxs = max.col(commodities_3)
N    = NROW(commodities_3)

commodities_3_returns[cbind(1:N, mins)] # returns min
commodities_3_returns[cbind(1:N, maxs)] # returns max
Ramnath
  • 54,439
  • 16
  • 125
  • 152