Sorting and finding values in other data frames

Question

I have a dataframe named commodities_3. It contains 28 columns with different commodities and 403 rows representing end-of-month data. What I need is to find the position for each row separately:

max value,
min value,
all other positives
all other negatives

Those index should then be used to locate the corresponding data in another dataframe with the same column and row characteristics called commodities_3_returns. These data should then be copied into 4 new dataframes (one dataframe for each sorting).

I know how to find the positions of the values for each row using which and which.min and which.max. But I don't know how to put this in a loop in order to do it for all 403 rows. And subsequently how to use this data to locate the corresponding data in the other dataframe commodities_3_returns.

Unfortunaltey I have to use a dataframe because I have dates as rownames in there, which I have to keep as I need them later for indexing, as well as NA's. It looks about like this:

commodities_3 <- as.data.frame(matrix(rnorm(15), nrow=5, ncol=3))
mydates <- as.Date(c("2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "2011-01-05"))
rownames(commodities_3) <- mydates
commodities_3[3,2] <- NA


commodities_3_returns <- as.data.frame(matrix(rnorm(15), nrow=5, ncol=3))
mydates <- as.Date(c("2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "2011-01-05"))
rownames(commodities_3_returns) <- mydates
commodities_3_returns[3,3] <- NA

As I said, I have in total 403 rows and 27 columns. In every row, there are some NA's which I have to keep as well. max.col doesn't seem to be able to handle NA's.

My desired output for the above mentioned example would be sth like this:

max_values <- as.data.frame(matrix(data=c(1:5,3,2,1,3,1), nrow=5, ncol=2, byrow=F))

You could start by giving us a reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It would also help us if you showed us how the output should look like. — Roman Luštrik, Sep 08 '11 at 09:10
I've merged your unregistered account into your registered account. You now have full control over this question. — Tim Post, Dec 07 '11 at 05:30

Richie Cotton · Answer 1 · 2011-09-08T09:29:45.387

If all the columns in commodities_3 are numeric, then you want a matrix, not a data frame. Then use the apply function. Some sample data, for reprodcubililty.

commodities_3 <- matrix(rnorm(12), nrow = 4)
commodities_3_returns <- matrix(1:12, nrow = 4)

The stats.

mins <- apply(commodities_3, 1, which.min)
maxs <- apply(commodities_3, 1, which.min)
pos <- apply(commodities_3, 1, function(x) which(x > 0))  #which is optional
neg <- apply(commodities_3, 1, function(x) which(x < 0))

Now use these in the index for commodities_3_returns. In the absence of coffee, my brain has only a clunky solution with a for loop

n_months <- nrow(commodities_3_returns)
min_returns <- numeric(n_months)
for(i in seq_len(n_months))
{
  min_returns[i] <- commodities_3_returns[i, mins[i]]
}

score 1 · Answer 2 · answered Sep 08 '11 at 15:15

Here is an alternate approach to get the min and max using max.col which is a C function internally. If you have a large data set, max.col works extremely fast compared to apply based solutions

mins = max.col(-commodities_3)
maxs = max.col(commodities_3)
N    = NROW(commodities_3)

commodities_3_returns[cbind(1:N, mins)] # returns min
commodities_3_returns[cbind(1:N, maxs)] # returns max

Sorting and finding values in other data frames

2 Answers2

Linked