2

I have a data set where missing values have been coded with a dot. I would like to have missing values blank (NA).

Here is the data frame:

df <- data.frame(ITEM1 = c(6, 8, '.'),
                   ITEM2 = c(1, 6, 9),
                   ITEM3 = c(4, 2, 5),
                   ITEM4 = c('.', 3, 2),
                   ITEM5 = c(1, 6, 9)
)

df

ITEM1 ITEM2 ITEM3 ITEM4 ITEM5
1     6     1     4     .     1
2     8     6     2     3     6
3     .     9     5     2     9
> 
Marie
  • 267
  • 1
  • 8
  • 16

3 Answers3

5

The columns will be character class because of the presence of .. Create a logical matrix with == and assign those elements to NA, then convert the data.frame columns to its appropriate type with type.convert

df[df == "." & !is.na(df)] <- NA
df <- type.convert(df, as.is = TRUE)

Or in a single step with replace (which internally does the assignment)

df <- type.convert(replace(df, df == "." & !is.na(df), NA), as.is = TRUE)

Or another approach is

df[] <- lapply(df, function(x) replace(x x %in% '.', NA))
df <- type.convert(df, as.is = TRUE)

Generally, this can be avoided all together, while reading the data itself i.e. specify na.strings = "." in read.csv/read.table etc.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you! This works for the example dataframe. However, when I am trying to apply it to the data set, I get the following error "Error in `[<-.data.frame`(`*tmp*`, df == ".", value = NA) : unsupported matrix index in replacement". – Marie May 04 '21 at 02:44
  • 1
    @Marie Can you check the `str(df)` I assume the columns are regular columns i.e vector – akrun May 04 '21 at 02:45
  • @Marie Do you have any `list` column or `matrix` as column. Also, is it a `data.frame` or `tibble` – akrun May 04 '21 at 02:48
  • Thank you for your patience, I am very new to R. No they are all characters. It is a data.frame. – Marie May 04 '21 at 02:52
  • @Marie can you change the expression as in the update – akrun May 04 '21 at 02:54
  • @Marie Code should have worked on your original data based on the description – akrun May 04 '21 at 03:02
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/231908/discussion-between-marie-and-akrun). – Marie May 04 '21 at 03:17
3

You could use the na_if function from dplyr. Note that the dot changes the type of your columns to be char which might not be what you want afterwards! The following code finds all char columns, replaces . with NA and converts the column to be numeric:

df <- df %>%
    mutate(across(where(is.character), ~as.numeric(na_if(., "."))))
Alex
  • 474
  • 4
  • 12
3

Here is an alternativ with set_na from sjlabelled package. Note the columns will remain as character type.

library(sjlabelled)
set_na(df, na = ".", as.tag = FALSE)

Output:

ITEM1 ITEM2 ITEM3 ITEM4 ITEM5
1     6     1     4  <NA>     1
2     8     6     2     3     6
3  <NA>     9     5     2     9
TarJae
  • 72,363
  • 6
  • 19
  • 66