why does `mutate_all()` function fail when reading an `xlsx` based data frame?

Question

I have a data frame (df) that I initially read in from xlsx document. I am trying to create a new df with all the missing values replaced by 999999. When I run the following command:

LPAv0.4.2 <- LPAv0.3 %>% mutate_all(funs(replace(., is.na(.), 999999)))

I get the following error:

13. stop(structure(list(message = "Evaluation error: 'origin' must be supplied.", call = mutate_impl(.data, dots), cppstack = NULL), .Names = c("message", "call", "cppstack"), class = c("Rcpp::eval_error", "C++Error", "error", "condition")))
12. mutate_impl(.data, dots)
11. mutate.tbl_df(.tbl, !(!(!funs)))
10. mutate(.tbl, !(!(!funs)))
9. mutate_all(., funs(replace(., is.na(.), 999999)))
8. function_list[[k]](value)
7. withVisible(function_list[[k]](value))
6. freduce(value, `_function_list`)
5. `_fseq`(`_lhs`)
4. eval(expr, envir, enclos)
3. eval(quote(`_fseq`(`_lhs`)), env, env)
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1. LPAv0.3 %>% mutate_all(funs(replace(., is.na(.), 999999)))

The weird thing is that if I write LPAv0.3 to a csv , then read it back in, the LPAv0.4.2 <- LPAv0.3 %>% mutate_all(funs(replace(., is.na(.), 999999))) command works as expected. However, if I write out to an xlsx file, then read back in, it fails again with the error above.

Any idea why this is happening? Also, any idea how I can replace all the missing values without having to print out of R then import it back in?

Thanks in advance.

error message

Error in as.POSIXct.numeric(value) : 'origin' must be supplied

    16. stop("'origin' must be supplied")
    15. as.POSIXct.numeric(value)
    14. as.POSIXct(value)
    13. `[<-.POSIXct`(`*tmp*`, thisvar, value = 99999)
    12. `[<-`(`*tmp*`, thisvar, value = 99999)
    11. `[<-.data.frame`(`*tmp*`, list, value = 99999)
    10. `[<-`(`*tmp*`, list, value = 99999)
    9. replace(., is.na(.), 99999)
    8. function_list[[k]](value)
    7. withVisible(function_list[[k]](value))
    6. freduce(value, `_function_list`)
    5. `_fseq`(`_lhs`)
    4. eval(expr, envir, enclos)
    3. eval(quote(`_fseq`(`_lhs`)), env, env)
    2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
    1. LPAv0.3 %>% replace(., is.na(.), 99999)

What package are you using to read in the excel file? You also need to include a reproducible example so that we can potentially see what about the imported file is causing the problem. — Melissa Key, Apr 28 '18 at 05:15
Sorry. New to SO and `R`. Package I am using to is `readxl`. I use the `read.csv` function to read in the `csv`—I think this may be from the `read.table` utility, but I cannot be sure. Regarding providing a reproducible example, I really don't know how to do it without uploading a file somewhere. I've tried to read up on how I can do this, but couldn't find anything to help. Is there anything you can point me to so that I can learn how to provide reproducible examples? (I profusely apologise for all the hand-holding I need.) — Atanas Janackovski, Apr 28 '18 at 05:53
You can share `dput(head(LPAv0.3))` when you read your excel file. [This](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) link might be helpful. — Prem, Apr 28 '18 at 06:25

score 3 · Answer 1 · answered Apr 28 '18 at 09:49

3

What you're looking for is this line:

LPAv0.4.2 <- LPAv0.3 %>% replace(., is.na(.), 99999)

Let me explain this a bit while we're here.

First, R standard functions and readxl can't write .xlsx files (despite fact that Excel himself can read variety of formats). However, readxl package have function write_excel_csv, which should write .csv in your locale so Excel will pick it up without a problem.

R standard and readxl functions do not rely on name of file you supplied, they use it just as identificator, and write or read data to (from) this file in specific pattern. You can check it by yourself - renaming .xlsx file to .csv will give you nothing but error when you'll try to open it with Excel.

File reading functions expect that you know file format beforehand, and you will use appropriate function. In your case, to read excel file (.xlsx) you need to use read_excel function from readxl package.

answered Apr 28 '18 at 09:49

Stormwalker

351
1
12

Thanks for your reply @Stormwalker. So I ran your code and this is the error I get: – Atanas Janackovski Apr 28 '18 at 10:59
@AtanasJanackovski provide actual error message. Try to run `library(dplyr)` first. – Stormwalker Apr 28 '18 at 11:02
Thanks for your reply Stormwalker. So I ran your code and got the error above (see my original post). An ideas? I've added a reproducible sample as requested by @Melissa Key and kindly guided by Prem. See above also. – Atanas Janackovski Apr 28 '18 at 11:05
@AtanasJanackovski make sure you're operating on a data.frame or data_frame with `as.data.frame` or `as_tibble` beforehand. – Stormwalker Apr 28 '18 at 11:13
Thanks again @Stormwaker. Yep, I ran `library(dplyr)` first. Still the same issue. However, I'm not sure what you mean by `as.data.frame` or `as_tibble`? – Atanas Janackovski Apr 28 '18 at 11:17
@AtanasJanackovski `LPAv0.3 <- as_tibble(LPAv0.3)` – Stormwalker Apr 28 '18 at 11:18
Thanks again @Stormwalker. Got the same error as posted above... Any other suggestions? – Atanas Janackovski Apr 28 '18 at 11:23
[This](https://stackoverflow.com/questions/2792819/r-dates-origin-must-be-supplied) must answer your question. You have date columns and have to replace NA in them in specific way. If you're reading csv, they most likely are read as strings and not dates. – Stormwalker Apr 28 '18 at 11:27
That sounds like the culprit. This is be just thinking out aloud, but I don't really need to replace the missing variables in the date columns. Can you point me in the right direction about how I could exclude them from the `replace()` function? – Atanas Janackovski Apr 28 '18 at 11:35
@AtanasJanackovski you need [`mutate_if`](http://dplyr.tidyverse.org/reference/summarise_all.html) function, not `mutate_all`. – Stormwalker Apr 28 '18 at 11:40
Thanks @Stormwalker. I've [tried this instruction](https://stackoverflow.com/questions/42052078/correct-syntax-for-mutate-if) as so: `LPAv0.4.2 <- LPAv0.3 %>% mutate_if(is.numeric, fun(if_else(is.na(.), 999999, .)))`, but I get the error `Error in fun(if_else(is.na(.), 999999, .)) : could not find function "fun"`. Any thoughts? – Atanas Janackovski Apr 28 '18 at 12:16

score 2 · Answer 2 · answered Apr 28 '18 at 12:34

Below error is because of missing value in POSIXct column.

Error in as.POSIXct.numeric(value) : 'origin' must be supplied

So you can try something like this. Here I have excluded all POSIXct columns to replace remaining column's NA with 999999

library(tidyverse)
library(lubridate)

LPAv0.3 %>% 
  mutate_at(vars(-one_of(names(.)[sapply(., is.POSIXct)])), funs(replace(., is.na(.), 999999)))

Knackiedoo · Answer 3 · 2018-04-28T12:08:13.553

Date columns in your file are read as a POSIXct variable by readxls. If you write this to a csv and read it in again, this column is read as a factor (or character if you use stringsAsFactors = FALSE). If you have any missing values in a column with a POSIX or Date class, you need to think carefully about what you are replacing them with. If you replace missing values with 999999 that needs to be converted to a date value, which in turn requires an origin. Any method you use will cause problems with this. If you do not have any missing values in date columns (as in your sample data), but they are confined to other (numeric or text) columns, then a simple solution is:

LPAv0.4.2 <- LPAv0.3
LPAv0.4.2 [is.na(LPAv0.4.2 )] <- 999999

You don't have to use a tidyverse verb for everything :-) I appreciate I am not completely answering your question as to WHY the code you gave gives an error, even when there are no missing date values. Incidentally, being part of the tidyverse, readxls will give you a tibble, whereas read.csv will give you a normal data frame. That will not make a difference in this case, but I thought I would just point that out in case it causes other issues, e.g. with indexing.

Now I understand. Hence the `origin` must be supplied error. I just need to figure out how to run the `mutate_if` function as pointed out by @Stormwalker. There are missing variables in some of the date columns. And I do want to keep dates as POSIX or date class. I found [this line](https://stackoverflow.com/questions/42052078/correct-syntax-for-mutate-if): `data %>% mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))`, and also tried replacing `ifelse` as `LPAv0.4.2 <- LPAv0.3 %>% mutate_if(is.numeric, fun(if_else(is.na(.), 999999, .)))`, but still no luck! Any more suggestions? — Atanas Janackovski, Apr 28 '18 at 12:14

score 0 · Answer 4 · answered Apr 28 '18 at 12:34

Thanks for the communal brains—I truly appreciated it.

So, I used the following line from this post referencing mutate_if:

LPAv0.4.2 <- LPAv0.3 %>% mutate_if(is.numeric, funs(if_else(is.na(.), 999, .)))

And it works as expected. I appreciate all the input. Thanks all for allowing me to syphon some knowledge!

Regards,

Atanas.

why does `mutate_all()` function fail when reading an `xlsx` based data frame?

error message

4 Answers4