0

I'm trying to cleanse a data set and there are null values stored under dbl and chr columns. I wrote the following code to summarize the number of null values by column but I notice it only return a value for the column that is numeric and ignored the columns with strings. The screenshot is showing that the 'agent' and 'company' also have null values but it looks like they are stored as strings?

Is there a way to find out the number of nulls under each column and filter to those only that are non-zeros?

colSums(is.na(df)) %>% data.frame(.) %>% filter(. !=0)

Returned Babies: 4

Thanks for the help.

Jack Deeth
  • 3,062
  • 3
  • 24
  • 39
Nomad82
  • 7
  • 2
  • 2
    You don't actually have true NULL values, you have character values which contain the string "NULL" which is a bit different. How did you initially import the data? It would probably be best to convert "NULL" to missing NA values at the time of import rather than fix the mess later. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not helpful. – MrFlick Feb 16 '22 at 20:28

1 Answers1

0

Unknown values are denoted NULL in SQL, but in R they are denoted NA. As @MrFlick wrote in a comment, your data does not contain NA values, but character strings "NULL".

You can either fix this during data import, e.g. with read.table(..., na.strings="NULL"), or later with

df[df == "NULL",] <- NA
cdalitz
  • 1,019
  • 1
  • 6
  • 7
  • Thanks for the explanation, it's a little clearer now on the behavior of R handling null values in string columns. I tried both suggestions but I think having dates in some of the variables causes the syntax to fail. I removed the columns that are dates and they worked, is there a way to apply them but ignoring the columns with type date? – Nomad82 Feb 19 '22 at 01:52
  • @alexander-woo You can pass several NA-strings to `read.table`, e.g. `na.strings=c("NULL", "0000-00-00")`. Moreover, the data.frame assignment solution uses the index operator as a lvalue, which means that you can also leave out other columns or fields, e.g. `df[df == "NULL", c(-4,-7)] <- NA`. – cdalitz Feb 19 '22 at 14:28