Select negative values from a Data Frame, using R

Question

I have the following data.frame called best100_gene:

best100_gene

I want to select only the lines where Data_PCA$ind$coord[, 2] < 0. I tried the following command:

gene_neg = best100_gene[which("Data_PCA$ind$coord[, 2]" < 0, )]

But it doesn't work! I tried several other options but they did not work either.

is this (`'Data_PCA$ind$coord[,2]'`) a column name? The `,` should be after the `)` — akrun, Oct 15 '16 at 15:28
Like @akrun suggested `gene_neg = best100_gene[which(Data_PCA$ind$coord[, 2] < 0), ]` — Zach, Oct 15 '16 at 15:32
Otherwise, @Zeineb, please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by (among other things) posting *sample data* (not a picture of oddly-aligned text) in addition to the code you provided. "Sample data" is subjective but implies small (not your entire dataset), easily reproducible, and easy imported (for us). It is common to use `head` and `dput` for this purpose. — r2evans, Oct 15 '16 at 15:34
We made it less subjective for the R tag by requiring `dput()` as per the tag description that you can read by hovering your mouse over the tag. — Hack-R, Oct 15 '16 at 15:40
Thank you for your answers. @Zach, I also tried your command gene_neg = best100_gene[which(Data_PCA$ind$coord[, 2] < 0), ] but the output is false, it doesnt select the negative values of the column "Data_PCA$ind$coord[,2]" — Zeineb, Oct 15 '16 at 15:47
@Zeineb I didn't realize that was your actual column name, my mistake. Use `gene_neg = best100_gene[which(best100_gene[, 4] < 0), ]` @Hack-R has a thorough explanation below. — Zach, Oct 15 '16 at 15:55

Hack-R · Accepted Answer · 2016-10-15T15:50:42.967

best100_gene <- data.frame(
  SYMBOL=c("A", "b", "c", "d", "e"),
  Data_PCA_contrib=c(.26,.25,.36,.11,.35),
  "Data_PCA$ind$coord[, 2]"=c(12,15,-11,-11,-11)
)

Here's my example data based on your screenshot:

  SYMBOL Data_PCA_contrib Data_PCA.ind.coord...2.
1      A             0.26                      12
2      b             0.25                      15
3      c             0.36                     -11
4      d             0.11                     -11
5      e             0.35                     -11

Here's one way, which I highly recommend with the crazy column names:

best100_gene[best100_gene[3] < 0, ]

  SYMBOL Data_PCA_contrib Data_PCA.ind.coord...2.
3      c             0.36                     -11
4      d             0.11                     -11
5      e             0.35                     -11

Here's another way:

best100_gene[best100_gene$Data_PCA.ind.coord...2. < 0, ]

  SYMBOL Data_PCA_contrib Data_PCA.ind.coord...2.
3      c             0.36                     -11
4      d             0.11                     -11
5      e             0.35                     -11

Here's another way:

good_names             <- c("symbol", "pca_contrib", "pca_coord")
colnames(best100_gene) <- good_names
best100_gene[best100_gene$pca_coord<0, ]

  symbol pca_contrib pca_coord
3      c        0.36       -11
4      d        0.11       -11
5      e        0.35       -11

score 1 · Answer 2 · answered Oct 16 '16 at 23:59

It's rather hard to even create data like yours. We need check.names=FALSE if we want to create data frames with names containing $ and [, and back-ticks ` to protect the weird names when referring to them ...

 best100_gene <- data.frame(
    SYMBOL=c("A", "b", "c", "d", "e"),
    Data_PCA_contrib=c(.26,.25,.36,.11,.35),
    `Data_PCA$ind$coord[, 2]`=c(12,15,-11,-11,-11),check.names=FALSE)

This is the closest to what you wanted ...

 best100_gene[best100_gene[,"Data_PCA$ind$coord[, 2]"]<0,]

You can also use

 subset(best100_gene,`Data_PCA$ind$coord[, 2]`<0)

or

 with(best100_gene,best100_gene[`Data_PCA$ind$coord[, 2]`<0,])

or

 dplyr::filter(best100_gene,`Data_PCA$ind$coord[, 2]`<0)

It would be better to rename your column names to something easier to handle, e.g.

 bb <- dplyr::rename(best100_gene,dpc2=`Data_PCA$ind$coord[, 2]`)

Or, even better, look farther back in your workflow and see where the weird names came from.

Select negative values from a Data Frame, using R

2 Answers2