I have a dataframe like this:
GENE a1 a2 a3 b1 b2 b3
G1 862 817 923 1096 997 946
G2 443 410 421 448 417 388
G3 396 348 372 428 351 361
G4 350 319 310 394 295 311
G5 350 332 341 412 303 316
G6 377 369 397 462 330 351
G7 362 348 399 437 378 376
G8 332 312 398 387 372 332
G9 511 473 564 496 533 441
G10 42 54 48 24 19 17
G11 346 308 343 279 349 259
G12 273 255 265 199 270 206
G13 26 19 18 14 19 19
G14 17 9 10 8 9 11
G15 12 8 6 9 5 21
The first row is the header. I want to filter this dataframe such that I end up with those rows, where at least 3 columns have counts > 30 each
I did something like this
data <- read.table("test.txt",header=TRUE,sep="\t",row.names=1)
data <- data[rowSums(data) > 30,]
But this sums up the columns 1 through 6 and see whether the sum is >30. I want to do for each column > 30 and then out of those, 3 or greater number of columns have count 30. SO the output of my data frame should be
GENE a1 a2 a3 b1 b2 b3
G1 862 817 923 1096 997 946
G2 443 410 421 448 417 388
G3 396 348 372 428 351 361
G4 350 319 310 394 295 311
G5 350 332 341 412 303 316
G6 377 369 397 462 330 351
G7 362 348 399 437 378 376
G8 332 312 398 387 372 332
G9 511 473 564 496 533 441
G10 42 54 48 24 19 17
G11 346 308 343 279 349 259
G12 273 255 265 199 270 206
How can I do this?
Thanks