Subset non 0 values in a data.frame

Question

I have a data.frame (DF) that looks like this:

 Col_names1      Col_values1    Col_names2     Col_values2    
     a                98             f               1           
     b                12             h              0.8         
     d                 0             mn              0            
     e               0.12            p               0                 
    ....             ....           ....            ....

I have to table the frequencies of Col_names in each Col_names column row by row. To do so firstly I extracted only the names to have the following new_DF

 Col_names1       Col_names2     
     a                f                
     b                h                 
     d                mn                  
     e                p                    
    ....             ....

Then I used the apply function to table the frequencies of the names row by row:

apl = apply(new_DF, 1, table)

The problem is that it gives to me the frequencies of names even when (as for "d" for example) the associated numerical value in the initial DF is "0". This frequencies have not to be computed.

PS: Totally the data.frame has 500 columns and 80 rows.

Why is line "e" being returned when "Col_values2" has a zero in that row? — A5C1D2H2I1M1N2O1R2T1, Sep 11 '13 at 12:38
possible duplicate of [How to remove rows with a Zero value in R](http://stackoverflow.com/questions/9977686/how-to-remove-rows-with-a-zero-value-in-r) or [Removal of rows containing zero](http://stackoverflow.com/questions/17364914/removal-of-rows-containing-zero) — plannapus, Sep 11 '13 at 12:40
Because items are independent according to columns. In other words in Col_names2 the corresponding item to "e" is "p" that has "0" value so it is removed. item "e" in column Col_names1 has value 0.12 so it has not to be removed.. — Fuv8, Sep 11 '13 at 12:44
@Fuv8, so how do you expect this to work, exactly? `data.frame`s are rectangular data structures. — A5C1D2H2I1M1N2O1R2T1, Sep 11 '13 at 12:46
I don't know Ananda Mahto...i just know I have to remove such items.. — Fuv8, Sep 11 '13 at 12:48
No duplicate plannapus 7. I have not complete cases as reported in the posts you suggested to me.. — Fuv8, Sep 11 '13 at 12:49
So do you want NA placed in rows where you deleted data from some columns but not others? What if every value in a row is 0 - does it get deleted? Are the data column locations predictable (like all even columns)? — John Paul, Sep 11 '13 at 12:54
Hi guys I will edit my post in order to explain the problem in a simplified way.. — Fuv8, Sep 11 '13 at 12:58

dayne · Accepted Answer · 2013-09-11T13:23:24.230

Would this approach work for you?

set.seed(1)
example <- data.frame(col_names1=sample(letters[1:13],30,replace=TRUE),
                      col_values1=sample(0:10,30,replace=TRUE),
                      col_names2=sample(letters[14:26],30,replace=TRUE),
                      values2=sample(0:10,30,replace=TRUE))
> dim(example)
[1] 30  4
> head(example)
  col_names1 col_values1 col_names2 values2
1          d           5          y       2
2          e           6          q       0
3          h           5          s       7
4          l           2          r       9
5          c           9          v       8
6          l           7          q       8


new.df <- data.frame(names=unlist(example[,grep("names",colnames(example))]),
                     values=unlist(example[,grep("values",colnames(example))]))

> dim(new.df)
[1] 60  2
> head(new.df)
            names values
col_names11     d      5
col_names12     e      6
col_names13     h      5
col_names14     l      2
col_names15     c      9
col_names16     l      7

Then you can just remove the values based on one column.

new.df[new.df$values!=0,]

Hi dayne! It works well, thank you very much. I have only to work to table in a proper way the result, but this is not a problem. Thank you again! — Fuv8, Sep 11 '13 at 13:11

A5C1D2H2I1M1N2O1R2T1 · Answer 2 · 2013-09-11T14:09:00.730

One option is to use a list (but I think the long data form might be more convenient in this case, and your data are not very large).

Assuming your data.frame is called "mydf":

## Create a matrix to subset each pair of columns
mat <- matrix(1:4, ncol = 2, byrow = TRUE)

## use `lapply` to subset and remove the offensive rows
lapply(sequence(nrow(mat)), function(x) {
  temp <- mydf[mat[x, ]]
  temp[temp[2] != 0, ]
})
# [[1]]
#   Col_names1 Col_values1
# 1          a       98.00
# 2          b       12.00
# 4          e        0.12
# 
# [[2]]
#   Col_names2 Col_values2
# 1          f         1.0
# 2          h         0.8

Building on @dayne's answer, if your columns are named in a regular pattern, you can use reshape quite effectively to get the long format. However, it would need an "id" variable (sequence(nrow(DF)) should do).

Example:

### Sample data
set.seed(1)
DF <- data.frame(col_names1 = sample(letters[1:13], 30, replace=TRUE),
                 col_values1 = sample(0:10, 30, replace=TRUE),
                 col_names2 = sample(letters[14:26], 30, replace=TRUE),
                 col_values2 = sample(0:10, 30, replace=TRUE))

### Add the ID
DF <- cbind(id = 1:nrow(DF), DF)

### Reshape the data into a long form
out <- reshape(DF, direction = "long", idvar="id", 
               varying = setdiff(names(DF), "id"), sep = "")

### Subset
out2 <- out[out$col_values != 0, ]
head(out2)
#     id time col_names col_values
# 1.1  1    1         d          5
# 2.1  2    1         e          6
# 3.1  3    1         h          5
# 4.1  4    1         l          2
# 5.1  5    1         c          9
# 6.1  6    1         l          7

Subset non 0 values in a data.frame

2 Answers2