6

Context

As a followup to R: Pass data.frame by reference to a function and How to add a column in the data frame within a function

I am attempting the following, seemingly easy, function:

naToZero <- function(df) {
  df$Vol[is.na(df$Vol)] <- 0
}

Data.frame

> str(WFM)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   990571 obs. of  14 variables:
 $ Date      : chr  "04/12/2017" "04/12/2017" "04/12/2017" "04/12/2017" ...
 $ Time      :Classes 'hms', 'difftime'  atomic [1:990571] 41970 41969 41968 41967 41966 ...
  .. ..- attr(*, "units")= chr "secs"
 $ Bar#      : chr  "197953/197953" NA "197952/197953" NA ...
 $ Bar Index : int  0 NA -1 NA NA -2 NA NA -3 NA ...
 $ Tick Range: int  0 NA 0 NA NA 0 NA NA 0 NA ...
 $ Open      : num  33.9 NA 33.9 NA NA ...
 $ High      : num  33.9 NA 33.9 NA NA ...
 $ Low       : num  33.9 NA 33.9 NA NA ...
 $ Close     : num  33.9 NA 33.9 NA NA ...
 $ Vol       : int  100 NA 200 NA NA 100 NA NA 400 NA ...
 $ MACDHist  : num  -59 NA -87 NA NA ...
 $ MACD      : num  -450 NA -445 NA NA ...
 $ MACDSig   : num  -391 NA -358 NA NA ...
 $ ZScore1   : num  NA NA NA NA NA NA NA NA NA NA ...

Hoping to use this function to speed things up in data cleaning.

Problem

After I run the function in the script editor, and then pass a data.frame to run it. But the function does not do anything and when I View(WFM), it's still the same old data. However, when I manually run the command:

WFM$Vol[is.na(WFM$Vol)] <- 0

Then it works.

Things I tried

I tried experimenting based on the two links I saw, being seemingly relevant:

Using WFM <- naToZero(WFM), produces a vector with a single value, 0.

Tried using WFM <- data.table(WFM) and running the function... same thing.

I must be missing something basic.

UseR10085
  • 7,120
  • 3
  • 24
  • 54
Robert Tan
  • 634
  • 1
  • 8
  • 21

2 Answers2

8

Virtually all objects in R are immutable: operations do not modify the original, they create a copy. So you need to assign that copy back to the original.

<- does that, but it assigns to df inside your function, which is a copy of the argument (= WFM) you pass to your function.

So you need to modify your function:

naToZero <- function(df) {
    df$Vol[is.na(df$Vol)] <- 0
    df
}

… and how you call it:

WFM = naToZero(WFM)
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Interesting mechanics, to clarify my understanding: by adding `df` to the function, we are in essence "bringing the copy to the front", thus making it explicit enough to assign it back to the original via `WFM = naToZero(WFM)`? – Robert Tan Apr 19 '17 at 15:16
  • 1
    @RobertTan No, it just ensures that the return value of the function call is `df`. Otherwise the return value of the function call is the value of the assignment (`<- 0`), which is the assigned value itself, i.e. 0 (as you have seen). – Konrad Rudolph Apr 19 '17 at 15:21
  • The implicit return could be confusing to some, I would have written `return(df)`. – Dmitrii I. Jun 13 '22 at 17:53
  • @DmitriiI. On the contrary! Implicit return is a core semantic in R, which is used ubiquitously. Every R user absolutely needs to be aware of it, and you *should* use it, and [avoid redundant explicit `return()` calls](https://stackoverflow.com/a/59090751/1968). – Konrad Rudolph Jun 14 '22 at 07:53
4

We can make this more dynamic using the devel version of dplyr (soon to be released 0.6.0)

library(tidyverse)
naToZero <- function(df, Col) {
    Col <- enquo(Col)
    ColN <- quo_name(Col)
     df %>% 
      mutate(!!ColN := replace(!!Col, is.na(!!Col), 0))
 

}

naToZero(WFM, Vol)
# A tibble: 3 × 2
#       Date   Vol
#      <chr> <dbl>
#1 04/12/2017     0
#2 04/12/2017    23
#3 04/12/2017    40

Or for any other columns

naToZero(WFM, Open)
# A tibble: 3 × 3
#       Date   Vol  Open
#       <chr> <dbl> <dbl>
#1 04/12/2017    NA  33.9
#2 04/12/2017    23   0.0
#3 04/12/2017    40  32.0

The enquo does similar functionality as substitute from base R by taking input arguments and converting it to quosure. In the mutate, we can unquote (!! or UQ) to evaluate the columns as well as the strings on the lhs created with quo_name

data

WFM <- tibble(Date = rep("04/12/2017", 3), Vol = c(NA, 23, 40), Open = c(33.9, NA, 32))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I’m not sure this helps OP … the same could also be achieved (using less code!) with base R. (Although I’m personally a fan of the dplyr version, it goes without saying.) – Konrad Rudolph Apr 19 '17 at 14:58
  • @KonradRudolph Yes, it could be done, but looking at the OP's dataset, it is derived from `tbl_df` – akrun Apr 19 '17 at 14:59