0

Below is the sample data and one manipulation. In the larger picture, I am reading in a bunch of excel files which are delineated by year and then taking only select columns (14 of 1000) and putting them into new data frames (df1,df2 for example). From there, I am combining these new data into one final data frame. My question is how I remove the rows in the final data frame that are populated with null values. I could filter but hoping to simply remove them in R and be done with them.

 testyear <-c(2010,2010,2010,2010,2011,2011,2011,2010)
 teststate<-c("CA", "Co", "NV", "NE", "CA", "CO","NV","NE")
 totalhousehold<-c(251,252,253,"NULL",301,302,303,"NULL")
 marriedhousehold <-c(85,86,87,"NULL",158,159,245,"NULL")


 test1<-data.frame(testyear,teststate,totalhousehold,marriedhousehold)


 testyear<-c(2012,2012,2012,2012)
 teststate<-c("WA","OR","WY","UT")
 totalhousehold<-c(654,650,646,641)
 marriedhousehold<-c(400,399,398,395)

 test2<-data.frame(testyear,teststate,totalhousehold,marriedhousehold)

 test3<-rbind(test1,test2)
Machavity
  • 30,841
  • 27
  • 92
  • 100
Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43

2 Answers2

1

As those are character columns, we could filter across only character columns to return rows that have no "NULL" elements and change the type of the columns with type.convert

library(dplyr)
test4 <- test3 %>% 
      filter(across(where(is.character), ~ . != "NULL")) %>%
       type.convert(as.is = TRUE)

-output

> test4
   testyear teststate totalhousehold marriedhousehold
1      2010        CA            251               85
2      2010        Co            252               86
3      2010        NV            253               87
4      2011        CA            301              158
5      2011        CO            302              159
6      2011        NV            303              245
7      2012        WA            654              400
8      2012        OR            650              399
9      2012        WY            646              398
10     2012        UT            641              395
> str(test4)
'data.frame':   10 obs. of  4 variables:
 $ testyear        : int  2010 2010 2010 2011 2011 2011 2012 2012 2012 2012
 $ teststate       : chr  "CA" "Co" "NV" "CA" ...
 $ totalhousehold  : int  251 252 253 301 302 303 654 650 646 641
 $ marriedhousehold: int  85 86 87 158 159 245 400 399 398 395

or in base R, use subset with rowSums to create a logical expression

type.convert(subset(test3, !rowSums(test3 == "NULL")), as.is = TRUE)
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Why dplyr when it can be done simple?

test3[test3 == "NULL"] = NA
test3 <- na.omit(test3)
Jason Mathews
  • 765
  • 3
  • 13