1

I have a huge dataset that I transformed using pivot_wider function. This resulted in a lot of values containing NULL. Now I want to transform those values to NA but none of my approaches seems to work:

df %>% replace(.=="NULL", NA)

df[df==0] <- NA

Also when using the pivot_wider function the values_fill = NA argument doesn't produce any NA's too.

Thanks in advance for your help.

P.Hubert
  • 137
  • 1
  • 8

2 Answers2

4

I think you have data which is something like this -

df <- data.frame(a1 = c(1, 1, 1, 1,2), a2 = 1:5, a3 = c('a', 'b', 'c', 'a', 'b'))

df
#  a1 a2 a3
#1  1  1  a
#2  1  2  b
#3  1  3  c
#4  1  4  a
#5  2  5  b

Then you are using pivot_wider which returns

df1 <- tidyr::pivot_wider(df, names_from = a3, values_from = a2)

#     a1   a         b         c        
#  <dbl> <list>    <list>    <list>   
#1     1 <dbl [2]> <dbl [1]> <dbl [1]>
#2     2 <NULL>    <dbl [1]> <NULL>   

This also returns the warning messages -

Warning message: Values are not uniquely identified; output will contain list-cols.

  • Use values_fn = list to suppress this warning.
  • Use values_fn = length to identify where the duplicates arise
  • Use values_fn = {summary_fun} to summarise duplicates

To answer your question about replacing these NULL values with NA you can do -

df1 <- df1 %>% mutate(across(a:c, ~replace(., lengths(.) == 0, NA)))
df1

#    a1    a         b         c        
#  <dbl> <list>    <list>    <list>   
#1     1 <dbl [2]> <dbl [1]> <dbl [1]>
#2     2 <lgl [1]> <dbl [1]> <lgl [1]>

But you should not ignore those warnings and check pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols" if this would be a better option for your data.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

If you want to replace all NULL values in your dataframe with NA you can do -

df1 <- df1 %>% mutate_all( ~replace(., lengths(.)==0, NA))

If you want to replace all NULL values with 0s, change the NA to 0 in the code.