0

I have a sample dataset as shown:

structure(list(`BEGA CHEESE - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `BENDIGO & ADELAIDE BANK - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BINGO INDUSTRIES - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BLACKMORES - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BLUESCOPE STEEL - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BORAL - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BRAMBLES - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `#ERROR` = c("$$ER: 0904,NO DATA AVAILABLE", 
"", ""), `BRICKWORKS - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BWP TRUST - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `#ERROR.1` = c("$$ER: 0904,NO DATA AVAILABLE", 
"", ""), `#ERROR.2` = c("$$ER: 0904,NO DATA AVAILABLE", "", ""
), `CHALLENGER - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `CHARTER HALL GROUP - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `CHARTER HALL LONG WALE - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `CHARTER HALL RETAIL REIT - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `CLEANAWAY WASTE - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `#ERROR.3` = c("$$ER: 0904,NO DATA AVAILABLE", 
"", ""), `COCHLEAR - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `COLES GROUP - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `#ERROR.4` = c("$$ER: 0904,NO DATA AVAILABLE", 
"", "")), row.names = c(NA, 3L), class = "data.frame")

For column names starting with #ERROR, I would like to fill all values in that column with NA.

I've tried hunting around for a function that would complete this action but the closest I came to it was using the replace function:

replace(df,"#ERROR", NA )

However this only works for the first column with name "#ERROR" and not for columns named #ERROR.1 and #ERROR.2 etc.

  • 1
    Note that your example is reproducible, yet not minimal! To make it easier for people to help you, look at: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Matifou Sep 15 '20 at 16:04

3 Answers3

1

mutate_at from dplyr is your friend! This won't work with duplicated column names (which is not a tidy format), so just remove those before with select:

library(tidyverse)
## minimal reproducible example
df <- tibble(`#ERROR`=c(1,2,3),
             `#ERROR`=c(1,2,3),
             not_error=c(7,6,8), .name_repair="minimal")
df
#> # A tibble: 3 x 3
#>   `#ERROR` `#ERROR` not_error
#>      <dbl>    <dbl>     <dbl>
#> 1        1        1         7
#> 2        2        2         6
#> 3        3        3         8
## now transform
df %>% 
 select(which(!duplicated(colnames(.))) )%>% 
  mutate(across(starts_with('#ERROR'),~NA))
#> # A tibble: 3 x 2
#>   `#ERROR` not_error
#>   <lgl>        <dbl>
#> 1 NA               7
#> 2 NA               6
#> 3 NA               8

Created on 2020-09-15 by the reprex package (v0.3.0)

Matifou
  • 7,968
  • 3
  • 47
  • 52
  • Hey Matifou, that worked nicely when the column names were #ERROR ... #ERROR.n, but not when the columns are all #ERROR (duplicated). Is there any way to achieve the same outcome when the column names are all #ERROR? Thanks! – Trong Nguyen Sep 15 '20 at 16:04
  • ok, just updated for duplicated columns. Solution suggested here just removes them, duplicated columns are not really recommended. – Matifou Sep 15 '20 at 16:21
1

Also across() can be useful:

library(dplyr)
#Code
dfa %>%
  mutate(across(starts_with('#ERROR'),~NA))

Output:

BEGA CHEESE - CO2 Equivalents Emission Total
1                                           NA
2                                           NA
3                                           NA
  BENDIGO & ADELAIDE BANK - CO2 Equivalents Emission Total
1                                                       NA
2                                                       NA
3                                                       NA
  BINGO INDUSTRIES - CO2 Equivalents Emission Total BLACKMORES - CO2 Equivalents Emission Total
1                                                NA                                          NA
2                                                NA                                          NA
3                                                NA                                          NA
  BLUESCOPE STEEL - CO2 Equivalents Emission Total BORAL - CO2 Equivalents Emission Total
1                                               NA                                     NA
2                                               NA                                     NA
3                                               NA                                     NA
  BRAMBLES - CO2 Equivalents Emission Total #ERROR BRICKWORKS - CO2 Equivalents Emission Total
1                                        NA     NA                                          NA
2                                        NA     NA                                          NA
3                                        NA     NA                                          NA
  BWP TRUST - CO2 Equivalents Emission Total #ERROR.1 #ERROR.2
1                                         NA       NA       NA
2                                         NA       NA       NA
3                                         NA       NA       NA
  CHALLENGER - CO2 Equivalents Emission Total CHARTER HALL GROUP - CO2 Equivalents Emission Total
1                                          NA                                                  NA
2                                          NA                                                  NA
3                                          NA                                                  NA
  CHARTER HALL LONG WALE - CO2 Equivalents Emission Total
1                                                      NA
2                                                      NA
3                                                      NA
  CHARTER HALL RETAIL REIT - CO2 Equivalents Emission Total
1                                                        NA
2                                                        NA
3                                                        NA
  CLEANAWAY WASTE - CO2 Equivalents Emission Total #ERROR.3
1                                               NA       NA
2                                               NA       NA
3                                               NA       NA
  COCHLEAR - CO2 Equivalents Emission Total COLES GROUP - CO2 Equivalents Emission Total #ERROR.4
1                                        NA                                           NA       NA
2                                        NA                                           NA       NA
3                                        NA                                           NA       NA

Some data used:

#Data
dfa <- structure(list(`BEGA CHEESE - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `BENDIGO & ADELAIDE BANK - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BINGO INDUSTRIES - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BLACKMORES - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BLUESCOPE STEEL - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BORAL - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BRAMBLES - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `#ERROR` = c("$$ER: 0904,NO DATA AVAILABLE", 
"", ""), `BRICKWORKS - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `BWP TRUST - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `#ERROR.1` = c("$$ER: 0904,NO DATA AVAILABLE", 
"", ""), `#ERROR.2` = c("$$ER: 0904,NO DATA AVAILABLE", "", ""
), `CHALLENGER - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `CHARTER HALL GROUP - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `CHARTER HALL LONG WALE - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `CHARTER HALL RETAIL REIT - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `CLEANAWAY WASTE - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `#ERROR.3` = c("$$ER: 0904,NO DATA AVAILABLE", 
"", ""), `COCHLEAR - CO2 Equivalents Emission Total` = c(NA_real_, 
NA_real_, NA_real_), `COLES GROUP - CO2 Equivalents Emission Total` = c(NA_integer_, 
NA_integer_, NA_integer_), `#ERROR.4` = c("$$ER: 0904,NO DATA AVAILABLE", 
"", "")), row.names = c(NA, 3L), class = "data.frame")
Duck
  • 39,058
  • 13
  • 42
  • 84
1

Keeping things in base R, you can use this one-liner:

df[grepl("#ERROR", names(df))] = NA
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294