0

Question about dataframe manipulations. not HW, although I will phrase it as a HW problem. I know there is a 3 line way of doing this but I can't seem to find it.

Problem
I have a dataframe A with 10 columns and a thousand rows, it's filled with some data. I have a 1-column matrix B with 1 column and a thousand rows, it's filled by 1 and 0 (TRUE or FALSE).

Goal: create a new dataframe C that contains only the rows in A that had a TRUE value in the equivalent index of matrix C.

(More clarification in case its not clear) I performed operations on A, stored the result (true or false) in a column dataframe (of same length of course) and now wish to extract that into its own matrix.
The size of the new matrix is sum(B) X 10 (just a logical deduction)

Thank you for reading! Your help is appreciated.

Example

Dataframe A

Name   State   metric1 metric2

Joe     MA      23       25
Moe     AZ      123      2971
Bo      CA      938      387
Yo      UT       1        3

matrix B

1
0
1
0

Desired Output (dataframe C)

Name   State   metric1 metric2

Joe     MA      23       25
Bo      CA      938      387
OctaveParango
  • 113
  • 1
  • 14
  • 2
    It is better to include reproducible example (`dput`) and expected output to make it easier for others to help you. – Sotos May 13 '16 at 13:43
  • 1
    See the following link on creating a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – lmo May 13 '16 at 13:56
  • Thank you for your input guys. I read Sotos's advice and produced this. Is it enough? I think it's very clear, but obviously I would think that. Please let me know if the example provided does not make things clear. – OctaveParango May 13 '16 at 14:00

2 Answers2

1

Why not simply

A <- data.frame(cbind(A=1:10,B=21:30,C=41:50))
B <- data.frame(D=rep(c(TRUE,FALSE),5))
C <- A[B$D,]

giving

> A
    A  B  C
1   1 21 41
2   2 22 42
3   3 23 43
4   4 24 44
5   5 25 45
6   6 26 46
7   7 27 47
8   8 28 48
9   9 29 49
10 10 30 50
> B
       D
1   TRUE
2  FALSE
3   TRUE
4  FALSE
5   TRUE
6  FALSE
7   TRUE
8  FALSE
9   TRUE
10 FALSE
> C
  A  B  C
1 1 21 41
3 3 23 43
5 5 25 45
7 7 27 47
9 9 29 49

EDIT after OP's edit

A <- data.frame(cbind(name=c("Joe",  "Moe",   "Bo" ,  "Yo"),
                      State=c("MA","AZ","CA","UT"),
                      metric1 = c(23,123,938,1),
                      metric2 = c(25,2971,387,3)))
B <- c(1,0,1,0)
C <- A[B==1,]
C

giving

> C
  name State metric1 metric2
1  Joe    MA      23      25
3   Bo    CA     938     387
rbm
  • 3,243
  • 2
  • 17
  • 28
  • 1
    And if you don't like the row IDs you can always reset them back to being sequential via `rownames(C) <- 1:nrow(C)` – rbm May 13 '16 at 14:08
  • Yes, this is exactly the stuff I was looking for. Thank you for your help and time!! – OctaveParango May 13 '16 at 14:09
1

Alternative using data.table:

library(data.table)

# convert you dataset into a data.table
  setDT(A)

# create a dummy variable (TRUE or FALSE) using operations other columns
  A[ , logical := sum(metric1, metric1) > 50 ]

# subset the data 
  A[ logical == T]
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109