0

I have a data frame(DF) that is like so:

DF <- rbind (c(10,20,30,40,50), c(21,68,45,33,21), c(11,98,32,10,30), c(50,70,70,70,50))

10 20 30 40 50
21 68 45 33 21
11 98 32 10 30
50 70 70 70 50

In my scenario my x would be 50. So my resulting dataframe(resultDF) will look like this:

10 50
21 21
11 30
50 50

How Can I do this in r? I have attempted using subset as below but it doesn't seem to work as I am expecting:

resultDF <- subset(DF, DF[nrow(DF),] == 50)

Error in x[subset & !is.na(subset), vars, drop = drop] : 

(subscript) logical subscript too long

smci
  • 32,567
  • 20
  • 113
  • 146
mika
  • 113
  • 1
  • 1
  • 7

2 Answers2

0

I have solved it. My sub setting was function was inaccurate. I used the following piece of code to get the results I needed.

resultDF <- DF[, DF[nrow(DF),] == 50]

mika
  • 113
  • 1
  • 1
  • 7
  • `subset()` works too, your issue was only about the syntax for calling it with a logical column vector (its third arg, not its second). See my answer. – smci Mar 25 '18 at 00:24
0

Your issue with subset() was only about the syntax for calling it with a logical column vector (its third arg, not its second). You can either use subset() or plain logical indexing. The latter is recommended.

The help page ?subset tells you its optional second arg ('subset') is a logical row-vector, and its optional third arg ('select') is a logical column-vector:

subset: logical expression indicating elements or rows to keep:
          missing values are taken as false.

select: expression, indicating columns to select from a data frame.

So you want to call it with this logical column-vector:

> DF[nrow(DF),] == 50
[1]  TRUE FALSE FALSE FALSE  

There are two syntactical ways to leave subset()'s second arg default and pass the third arg:

# Explicitly pass the third arg by name...
> subset(DF, select=(DF[nrow(DF),] == 50) )
# Leave 2nd arg empty, it will default (to NULL)...
> subset(DF, , (DF[nrow(DF),] == 50) )
     [,1] [,2]
[1,]   10   50
[2,]   21   21
[3,]   11   30
[4,]   50   50

The second way is probably preferable as it looks like generic row,col-indexing, and also doesn't require you to know the third arg's name.

(As a mnemonic, in R and SQL terminology, understand that 'select' implicitly means 'column-indices', and 'filter'/'subset' implicitly means 'row-indices'. Or in data.table terminology they're called i-indices, j-indices respectively.)

smci
  • 32,567
  • 20
  • 113
  • 146