How to recode dot to NA in R?

Question

I have a data set where missing values have been coded with a dot. I would like to have missing values blank (NA).

Here is the data frame:

df <- data.frame(ITEM1 = c(6, 8, '.'),
                   ITEM2 = c(1, 6, 9),
                   ITEM3 = c(4, 2, 5),
                   ITEM4 = c('.', 3, 2),
                   ITEM5 = c(1, 6, 9)
)

df

ITEM1 ITEM2 ITEM3 ITEM4 ITEM5
1     6     1     4     .     1
2     8     6     2     3     6
3     .     9     5     2     9
>

Related : https://stackoverflow.com/questions/3357743/replacing-character-values-with-na-in-a-data-frame — Ronak Shah, May 04 '21 at 04:21

akrun · Accepted Answer · 2021-05-04T02:54:02.363

5

The columns will be character class because of the presence of .. Create a logical matrix with == and assign those elements to NA, then convert the data.frame columns to its appropriate type with type.convert

df[df == "." & !is.na(df)] <- NA
df <- type.convert(df, as.is = TRUE)

Or in a single step with replace (which internally does the assignment)

df <- type.convert(replace(df, df == "." & !is.na(df), NA), as.is = TRUE)

Or another approach is

df[] <- lapply(df, function(x) replace(x x %in% '.', NA))
df <- type.convert(df, as.is = TRUE)

Generally, this can be avoided all together, while reading the data itself i.e. specify na.strings = "." in read.csv/read.table etc.

edited May 04 '21 at 02:54

answered May 04 '21 at 02:23

akrun

874,273
37
540
662

Thank you! This works for the example dataframe. However, when I am trying to apply it to the data set, I get the following error "Error in `[<-.data.frame`(`*tmp*`, df == ".", value = NA) : unsupported matrix index in replacement". – Marie May 04 '21 at 02:44
1

@Marie Can you check the `str(df)` I assume the columns are regular columns i.e vector – akrun May 04 '21 at 02:45
@Marie Do you have any `list` column or `matrix` as column. Also, is it a `data.frame` or `tibble` – akrun May 04 '21 at 02:48
Thank you for your patience, I am very new to R. No they are all characters. It is a data.frame. – Marie May 04 '21 at 02:52
@Marie can you change the expression as in the update – akrun May 04 '21 at 02:54
@Marie Code should have worked on your original data based on the description – akrun May 04 '21 at 03:02
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/231908/discussion-between-marie-and-akrun). – Marie May 04 '21 at 03:17

score 3 · Answer 2 · answered May 04 '21 at 08:04

You could use the na_if function from dplyr. Note that the dot changes the type of your columns to be char which might not be what you want afterwards! The following code finds all char columns, replaces . with NA and converts the column to be numeric:

df <- df %>%
    mutate(across(where(is.character), ~as.numeric(na_if(., "."))))

score 3 · Answer 3 · answered May 04 '21 at 08:26

Here is an alternativ with set_na from sjlabelled package. Note the columns will remain as character type.

library(sjlabelled)
set_na(df, na = ".", as.tag = FALSE)

Output:

ITEM1 ITEM2 ITEM3 ITEM4 ITEM5
1     6     1     4  <NA>     1
2     8     6     2     3     6
3  <NA>     9     5     2     9

How to recode dot to NA in R?

3 Answers3