0

I have a data.frame (DF) that looks like this:

 Col_names1      Col_values1    Col_names2     Col_values2    
     a                98             f               1           
     b                12             h              0.8         
     d                 0             mn              0            
     e               0.12            p               0                 
    ....             ....           ....            ....

I have to table the frequencies of Col_names in each Col_names column row by row. To do so firstly I extracted only the names to have the following new_DF

 Col_names1       Col_names2     
     a                f                
     b                h                 
     d                mn                  
     e                p                    
    ....             ....           

Then I used the apply function to table the frequencies of the names row by row:

apl = apply(new_DF, 1, table)

The problem is that it gives to me the frequencies of names even when (as for "d" for example) the associated numerical value in the initial DF is "0". This frequencies have not to be computed.

PS: Totally the data.frame has 500 columns and 80 rows.

halfer
  • 19,824
  • 17
  • 99
  • 186
Fuv8
  • 885
  • 3
  • 11
  • 21
  • Why is line "e" being returned when "Col_values2" has a zero in that row? – A5C1D2H2I1M1N2O1R2T1 Sep 11 '13 at 12:38
  • possible duplicate of [How to remove rows with a Zero value in R](http://stackoverflow.com/questions/9977686/how-to-remove-rows-with-a-zero-value-in-r) or [Removal of rows containing zero](http://stackoverflow.com/questions/17364914/removal-of-rows-containing-zero) – plannapus Sep 11 '13 at 12:40
  • Because items are independent according to columns. In other words in Col_names2 the corresponding item to "e" is "p" that has "0" value so it is removed. item "e" in column Col_names1 has value 0.12 so it has not to be removed.. – Fuv8 Sep 11 '13 at 12:44
  • 2
    @Fuv8, so how do you expect this to work, exactly? `data.frame`s are rectangular data structures. – A5C1D2H2I1M1N2O1R2T1 Sep 11 '13 at 12:46
  • I don't know Ananda Mahto...i just know I have to remove such items.. – Fuv8 Sep 11 '13 at 12:48
  • No duplicate plannapus 7. I have not complete cases as reported in the posts you suggested to me.. – Fuv8 Sep 11 '13 at 12:49
  • So do you want NA placed in rows where you deleted data from some columns but not others? What if every value in a row is 0 - does it get deleted? Are the data column locations predictable (like all even columns)? – John Paul Sep 11 '13 at 12:54
  • Hi guys I will edit my post in order to explain the problem in a simplified way.. – Fuv8 Sep 11 '13 at 12:58

2 Answers2

1

Would this approach work for you?

set.seed(1)
example <- data.frame(col_names1=sample(letters[1:13],30,replace=TRUE),
                      col_values1=sample(0:10,30,replace=TRUE),
                      col_names2=sample(letters[14:26],30,replace=TRUE),
                      values2=sample(0:10,30,replace=TRUE))
> dim(example)
[1] 30  4
> head(example)
  col_names1 col_values1 col_names2 values2
1          d           5          y       2
2          e           6          q       0
3          h           5          s       7
4          l           2          r       9
5          c           9          v       8
6          l           7          q       8


new.df <- data.frame(names=unlist(example[,grep("names",colnames(example))]),
                     values=unlist(example[,grep("values",colnames(example))]))

> dim(new.df)
[1] 60  2
> head(new.df)
            names values
col_names11     d      5
col_names12     e      6
col_names13     h      5
col_names14     l      2
col_names15     c      9
col_names16     l      7

Then you can just remove the values based on one column.

new.df[new.df$values!=0,]
dayne
  • 7,504
  • 6
  • 38
  • 56
  • Hi dayne! It works well, thank you very much. I have only to work to table in a proper way the result, but this is not a problem. Thank you again! – Fuv8 Sep 11 '13 at 13:11
1

One option is to use a list (but I think the long data form might be more convenient in this case, and your data are not very large).

Assuming your data.frame is called "mydf":

## Create a matrix to subset each pair of columns
mat <- matrix(1:4, ncol = 2, byrow = TRUE)

## use `lapply` to subset and remove the offensive rows
lapply(sequence(nrow(mat)), function(x) {
  temp <- mydf[mat[x, ]]
  temp[temp[2] != 0, ]
})
# [[1]]
#   Col_names1 Col_values1
# 1          a       98.00
# 2          b       12.00
# 4          e        0.12
# 
# [[2]]
#   Col_names2 Col_values2
# 1          f         1.0
# 2          h         0.8

Building on @dayne's answer, if your columns are named in a regular pattern, you can use reshape quite effectively to get the long format. However, it would need an "id" variable (sequence(nrow(DF)) should do).

Example:

### Sample data
set.seed(1)
DF <- data.frame(col_names1 = sample(letters[1:13], 30, replace=TRUE),
                 col_values1 = sample(0:10, 30, replace=TRUE),
                 col_names2 = sample(letters[14:26], 30, replace=TRUE),
                 col_values2 = sample(0:10, 30, replace=TRUE))

### Add the ID
DF <- cbind(id = 1:nrow(DF), DF)

### Reshape the data into a long form
out <- reshape(DF, direction = "long", idvar="id", 
               varying = setdiff(names(DF), "id"), sep = "")

### Subset
out2 <- out[out$col_values != 0, ]
head(out2)
#     id time col_names col_values
# 1.1  1    1         d          5
# 2.1  2    1         e          6
# 3.1  3    1         h          5
# 4.1  4    1         l          2
# 5.1  5    1         c          9
# 6.1  6    1         l          7
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485