1

I am trying to convert missing factor values to NA in a data frame, and create a new data frame with replaced values but when I try to do that, previously character factors are all converted to numbers. I cannot figure out what I am doing wrong and cannot find a similar question. Could anybody please help?

Here are my codes:

orders <- c('One','Two','Three', '')
ids <- c(1, 2, 3, 4)
values <- c(1.5, 100.6, 19.3, '')

df <- data.frame(orders, ids, values)
new.df <- as.data.frame(matrix( , ncol = ncol(df), nrow = 0))
names(new.df) <- names(df)

for(i in 1:nrow(df)){
    row.df <- df[i, ]
    print(row.df$orders) # "One", "Two", "Three", ""
    print(str(row.df$orders)) # Factor
    # Want to replace "orders" value in each row with NA if it is missing 
    row.df$orders <- ifelse(row.df$orders == "", NA, row.df$orders)
    print(row.df$orders) # Converted to number
    print(str(row.df$orders)) # int or logi
    # Add the row with new value to the new data frame
    new.df[nrow(new.df) + 1, ] <- row.df
    }

and I get this:

> new.df
  orders ids values
1      2   1      2
2      4   2      3
3      3   3      4
4     NA   4      1

but I want this:

> new.df
  orders ids values
1    One   1    1.5
2    Two   2  100.6
3  Three   3   19.3
4     NA   4       
owl
  • 1,841
  • 6
  • 20
  • 30

2 Answers2

1

Convert empty values to NA and use type.convert to change their class.

df[df == ''] <- NA
df <- type.convert(df)
df
#  orders ids values
#1    One   1    1.5
#2    Two   2  100.6
#3  Three   3   19.3
#4   <NA>   4     NA

str(df)
#'data.frame':  4 obs. of  3 variables:
#$ orders: Factor w/ 4 levels "","One","Three",..: 2 4 3 1
#$ ids   : int  1 2 3 4
#$ values: num  1.5 100.6 19.3 NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks and sorry for not being clear but I want "orders" to be still a factor after replacing missing with NA. Do you know why these factors are converted to 2, 3, and 4? Where do they come from? – owl Jun 16 '20 at 05:44
  • 1
    @owl `factors` are internally represented as numbers hence, you see those numbers. If you want to keep `orders` as factors you can only use `df <- type.convert(df)` – Ronak Shah Jun 16 '20 at 05:59
  • Thank you! That would give me what I wanted. I did not realize until now that factors are internally represented as numbers. – owl Jun 16 '20 at 06:46
  • I just realized that this would convert values to NA but not orders. Is there a way to specify to replace empty orders with NA but not values? – owl Jun 16 '20 at 07:01
  • 1
    A column can only have one `class`. empty value (`''`) is a character and not a number. So if you put an empty value in `values` it will turn complete column to character. – Ronak Shah Jun 16 '20 at 07:06
  • Thanks, the code above is a simplified version of what I want to do. In my actual data set, there are multiple columns with empty entries but I only want to replace the missing value with NA in one column. Do I need to do a loop or something (as I tried to do) to replace just one column? – owl Jun 16 '20 at 07:24
  • 1
    If you want to replace the empty value with `NA` for only one column you can do `df$values[is.na(df$values)] <- NA` – Ronak Shah Jun 16 '20 at 07:36
0

Thanks to the hint from Ronak Shah, I did this and it gave me what I wanted.

df$orders[df$orders == ''] <- NA

This will give me:

> df
  orders ids values
1    One   1    1.5
2    Two   2  100.6
3  Three   3   19.3
4   <NA>   4       

> str(df)
'data.frame':   4 obs. of  3 variables:
 $ orders: Factor w/ 4 levels "","One","Three",..: 2 4 3 NA
 $ ids   : num  1 2 3 4
 $ values: Factor w/ 4 levels "","1.5","100.6",..: 2 3 4 1

In case you are curious about the difference between NA and as I was, you can find the answer here.

Your suggestion

df$orders[is.na(df$orders)] <- NA

did not work maybe becasuse missing entry is not NA?

owl
  • 1,841
  • 6
  • 20
  • 30