0

i have the following Dataset:

structure(list(Species = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("Bream",  "Parkki", "Perch", "Pike", "Roach", "Smelt", "Whitefish"),
class = "factor"), 
     WeightGRAM = c(242, 290, 340, 363, 430, 450), VertLengthCM = c(23.2, 
     24, 23.9, 26.3, 26.5, 26.8), DiagLengthCM = c(25.4, 26.3, 
     26.5, 29, 29, 29.7), CrossLengthCM = c(30, 31.2, 31.1, 33.5, 
     34, 34.7), HeightCM = c(11.52, 12.48, 12.3778, 12.73, 12.444, 
     13.6024), WidthCM = c(4.02, 4.3056, 4.6961, 4.4555, 5.134, 
     4.9274)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",  "data.frame"))

I am trying to check for "0" or negative values in the numeric columns and remove them.

I have the following code:

fish_data <- fish_data [which(rowSums(fish_data) > 0), ] 

But i will get a error message:

Error in rowSums(fish_data) : 'x' must be numeric

I roughly guess because my "species" columns are factor, this message came up.

Can i know how can i skip the first column and ask R to check for only numeric columns for "0" or negative values?

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Please make your question reproducible. Please don’t use images of code or data as they cannot be used without a lot of unnecessary effort. Check out stack overflow guidance [mre] and [ask]. Include a minimal dataset in the form of an object for example if a data frame as df <- data.frame(…) where … is your variables and values or use `dput(head(df))`. – Peter Jul 12 '20 at 09:18

5 Answers5

2

Here is a way that keeps only the columns with no values less than or equal to zero.

keep <- sapply(fish_data, function(x) {
  if(is.numeric(x)) all(x > 0) else TRUE
})
fish_data[keep]
## A tibble: 6 x 7
#  Species WeightGRAM VertLengthCM DiagLengthCM CrossLengthCM HeightCM WidthCM
#  <fct>        <dbl>        <dbl>        <dbl>         <dbl>    <dbl>   <dbl>
#1 Bream          242         23.2         25.4          30       11.5    4.02
#2 Bream          290         24           26.3          31.2     12.5    4.31
#3 Bream          340         23.9         26.5          31.1     12.4    4.70
#4 Bream          363         26.3         29            33.5     12.7    4.46
#5 Bream          430         26.5         29            34       12.4    5.13
#6 Bream          450         26.8         29.7          34.7     13.6    4.93
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
2

Using dplyr we can use select to select columns where all values are greater than 0 or are not numeric.

library(dplyr)
df %>% select(where(~(is.numeric(.) && all(. > 0)) || !is.numeric(.)))


# A tibble: 6 x 7
#  Species WeightGRAM VertLengthCM DiagLengthCM CrossLengthCM HeightCM WidthCM
#  <fct>        <dbl>        <dbl>        <dbl>         <dbl>    <dbl>   <dbl>
#1 Bream          242         23.2         25.4          30       11.5    4.02
#2 Bream          290         24           26.3          31.2     12.5    4.31
#3 Bream          340         23.9         26.5          31.1     12.4    4.70
#4 Bream          363         26.3         29            33.5     12.7    4.46
#5 Bream          430         26.5         29            34       12.4    5.13
#6 Bream          450         26.8         29.7          34.7     13.6    4.93

In the previous version of dplyr, we can use select_if :

df %>% select_if(~(is.numeric(.) && all(. > 0)) || !is.numeric(.))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

you only need to specifiy the columns for the rowSums() function:

fish_data <- fish_data[which(rowSums(fish_data[,2:7]) > 0), ] 

note that rowsums sums all values across the row im not sure if thats whta you really want to achieve? you can check the output of rowsums with:

> rowSums(fish_data[,2:7])
[1] 336.1400 388.2856 438.5739 468.9855 537.0780 559.7298
user12256545
  • 2,755
  • 4
  • 14
  • 28
  • Thanks, i think i used the wrong code reference. i should be keying ...... fish_data[fish_data <= 0] <- NA ...... convert them to NA and remove the NA..........but i get an warning message "Warning message: In Ops.factor(left, right) : ‘<=’ not meaningful for factors" ......not sure if i can ignore it. – Newbie coder Jul 12 '20 at 09:37
0

Thanks all, i think i figure out.

i should be keying:

fish_data[fish_data <= 0] <- NA #convert records with less than or equal to 0 to NA

fish_data <- na.omit(fish_data) # delete rows with NA

But i will get a warning message:

Warning message: In Ops.factor(left, right) : ‘<=’ not meaningful for factors

0
# Option 1: (Safer because will retain rows containing NAs)
# Subset data.frame to not contain any observations with 0 values: 
# data.frame => stdout (console)
df[rowMeans(df != 0, na.rm = TRUE) == 1,]


# Option 2: (More dangerous because it will remove all rows containing
# NAs) subset data.frame to not contain any observations with 0 values: 
# data.frame => stdout (console)
df[complete.cases(replace(df, df == 0, NA)),]

# Option 3 (Variant of Option 1):
# Subset data.frame to not contain any observations with 0 values: 
# data.frame => stdout (console)
df[rowMeans(Vectorize(function(x){x != 0})(df[,sapply(df, is.numeric)]),
            na.rm = TRUE) == 1,]

# Option 4: Using Higher-order functions: 
# Subset data.frame to not contain any observations with 0 values: 
# data.frame => stdout (console)
df[Reduce(function(y, z){intersect(y, z)},
  Map(function(x){which(x > 0)}, df[,sapply(df, is.numeric)])), ]

# Option 5 tidyverse: 
# Subset data.frame to not contain any observations with 0 values: 
# data.frame => stdout (console)
library(dplyr)
df %>% 
  filter_if(is.numeric, all_vars(. > 0))

Data:

df <- structure(list(Species = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("Bream",  "Parkki", "Perch", "Pike", "Roach", "Smelt", "Whitefish"),
class = "factor"), 
WeightGRAM = c(242, 290, 340, 363, 0, 450), VertLengthCM = c(23.2, 
24, 23.9, 26.3, 26.5, 26.8), DiagLengthCM = c(25.4, 26.3, 
26.5, 29, 29, 29.7), CrossLengthCM = c(30, 31.2, 31.1, 33.5, 
34, 34.7), HeightCM = c(11.52, 0, 12.3778, 12.73, 12.444, 
13.6024), WidthCM = c(4.02, 4.3056, 4.6961, 4.4555, 5.134, 
4.9274)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",  "data.frame"))
hello_friend
  • 5,682
  • 1
  • 11
  • 15