How to subset by only taking infromation from thsoe who have a certain value in a column

Question

I have a very large data set and I would like to create a new table that only has information from the columns that equal a certain number. This is a fake data set but lets call it mydata. example data

My actual data set is much larger than this but this is basically what I want to see

What did you try? Where exactly did you set stuck? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Jul 02 '19 at 19:48
Possible duplicate: https://stackoverflow.com/questions/38063351/subsetting-from-a-data-frame — MrFlick, Jul 02 '19 at 19:48

score 0 · Answer 1 · answered Jul 02 '19 at 19:49

0

We can subset the rows of the dataset based on the '0' values in 'V4' while selecting the columns 1 through 4

subset(df1, V4 == 0, select = 1:4)

answered Jul 02 '19 at 19:49

akrun

874,273
37
540
662

score 0 · Answer 2 · answered Jul 02 '19 at 20:10

If your data file is very large and you only want the rows that match a certain criterion, package sqldf can filter while it reads in the data.

Here is an example use case. I will create a binary column in built in data set iris and write the new table to disk.

library(sqldf)

set.seed(1234)
iris1 <- iris
iris1$V4 <- rbinom(nrow(iris1), 1, 0.5)
write.table(iris1, "iris3.dat", sep = ",", quote = FALSE, row.names = FALSE)

Now read the data in and filter only the rows where V4 == 0.

# set up file connection
iris3 <- file("iris3.dat")
df1 <- sqldf('select * from iris3 where "V4" = 0')
close(iris3)

Compare with the result of subset.

df2 <- subset(iris1, V4 == 0)
row.names(df2) <- NULL
all.equal(df1, df2)
#[1] "Component “Species”: Modes: character, numeric"                      
#[2] "Component “Species”: Attributes: < target is NULL, current is list >"
#[3] "Component “Species”: target is character, current is factor"

Final clean up.

unlink("iris3.dat")
rm(iris1, df1, df2)

How to subset by only taking infromation from thsoe who have a certain value in a column

2 Answers2