0

I've encountered very strange issue. I definied a function that loads data from online source and returns a dataframe after some transformations. However, I realised that the data in two columns of the output has decimal delimiter ",", which causes R to interpret this columns as factors.

What I've tried to do was to tranform data within function, by adding two additional lines to the function body:

data_table$usd <- as.numeric(sub(",", ".", data_table$usd))
data_table$eur <- as.numeric(sub(",", ".", data_table$eur))

But this turns out to overwrite whole output (data_table) with numerical vector (output of last line of code I guess). On the other hand, when I execute the same exact code outside of a function, it works as I expect, which makes me even more confused.

Any ideas, why the code inside function cannot transform single columns, but overwrites whole dataframe?

MartinP
  • 23
  • 4
  • 1
    It depends on how the function and `data_table` are defined. Please make this question reproducible by including a *minimal* function definition (just enough to demonstrate th issue, no other calcs needed) and sample data (preferably using `data.frame(..)` or `dput(..)` to give it to us, but very specifically not just a copy of the console output). See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for discussions on "reproducible questions". Thanks! – r2evans Jan 23 '22 at 13:49
  • When you pass an object to a function, a copy is made so that the original version is not affected. The return()` function is used to move a copy of the object out of the function. By design, the last line of the function is treated as a return. This is by design to protect you from accidentally modifying an object. If you are importing data using `read.csv` or `read.csv2` you can convert the decimal value when you import the data into R. – dcarlson Jan 23 '22 at 14:21
  • Problem is solved. Adding "return(data_table)" at the end of the function helped. – MartinP Jan 23 '22 at 16:06

1 Answers1

0

I am guessing here, but you probably end the function body with the assignment of the column like so

modified_iris <- function() {
  my_iris <- iris
  my_iris$new <- toupper(iris$Species)
}
head(modified_iris())
#> [1] "SETOSA" "SETOSA" "SETOSA" "SETOSA" "SETOSA" "SETOSA"

Instead, you want to make sure that the full data frame is returned

modified_iris <- function() {
  my_iris <- iris
  my_iris$new <- toupper(iris$Species)
  my_iris
}
head(modified_iris())
#> # A tibble: 6 × 6
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species new   
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <chr> 
#> 1          5.1         3.5          1.4         0.2 setosa  SETOSA
#> 2          4.9         3            1.4         0.2 setosa  SETOSA
#> 3          4.7         3.2          1.3         0.2 setosa  SETOSA
#> 4          4.6         3.1          1.5         0.2 setosa  SETOSA
#> 5          5           3.6          1.4         0.2 setosa  SETOSA
#> 6          5.4         3.9          1.7         0.4 setosa  SETOSA
Gregor de Cillia
  • 7,397
  • 1
  • 26
  • 43