I wanted to order by some column, and subset, a multi-column dataframe but the command used did not work
print(df[order(df$x) & df$x < 5,])
This does not order the results.
To debug this I generated a test dataframe with 1 column but this 'simplification' had unexpected effects
df <- data.frame(x = sample(1:50))
print(df[order(df$x) & df$x < 5,])
This does not order the results so I felt I had reproduced the problem but with simpler data.
Breaking down the process to first ordering and then subsetting led me to discover the ordering in this case does not generate a dataframe object
df <- data.frame(x = sample(1:50))
ndf <- df[order(df$x),]
print(class(ndf))
produces
[1] "integer"
Attempting to subset the resultant "integer" ndf object using dataframe syntax e.g.
print(ndf[ndf$x < 5, ])
obviously generates an error:
Error in ndf$x : $ operator is invalid for atomic vectors.
Simplifying even further, I found subsetting alone (not applying the order function ) does not generate a dataframe object
ndf <- df[df$x < 5,]
class(ndf)
[1] "integer"
It turns out for the multicolumn dataframe that separating the ordering and the subsetting does work as expected
df <- data.frame(x = sample(1:50), y = rnorm(50))
ndf <- df[order(df$x),]
print(ndf[ndf$x < 5, ])
and this solved my original problem, but led to two further questions:
- Why is the type of object returned, as described above based on the 1 column dataframe test case, not a dataframe? ( I appreciate a 1 column dataframe just contains a single vector but it's still wrapped in a dataframe ?)
- Is it possible to order and subset a multicolumn dataframe in 1 step?