-3

Being a beginner in R, I need some help:

I have already written a function call it say fun(a,b,c) and returning say "d". a, b, c are values of columns in my dataset of 4m records. my function applies some logic and returns some value on "d", which I want to later add it to my dataset.

Please can someone help me with the syntax of 1. calling a function on a dataset with multiple arguments 2. add the new information in "d" to my dataset 3. efficient enough to handle 4m records.

Thanks in advance.

Please see below code

#hybrid FUNCTION
hybridfun <- function(df, lookup, df_year, df_name, df_id, lup_year, lup_name, lup_id_digit, lup_id_letter){
   for (i in 1:nrow(lookup)){
    df$new = "NOT_SURE"
    if (df$df_year == lookup$lup_year)
        if (df$df_name == lookup$lup_name)
            if (substring(df$df_id, lookup$lup_id_digit, lookup$lup_id_digit) == lookup$lup_id_letter){
        df$new = "HYBRID"
        break
    }
   }
   print(fuel_type)
}

hybridfun(data, lookup, "data_year", "data_name", "data_id", "lookup_year", "lookup_name", "lookup_id_digit", "lookup_id_letter")
  • 6
    Please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of what you want to achieve – Emmanuel-Lin Feb 08 '18 at 08:38
  • 2
    Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask). And more importantly, please read [the Stack Overflow question checklist](http://meta.stackexchange.com/q/156810/204922). You might also want to learn about [Minimal, Complete, and Verifiable Examples](http://stackoverflow.com/help/mcve). – Clijsters Feb 08 '18 at 09:16

1 Answers1

0

I'm not entirely sure what you are trying to do. Perhaps something like this?

set.seed(2017);
df <- data.frame(
    a = rnorm(6),
    b = rnorm(6),
    c = rnorm(6));
df;
#            a            b          c
#1  1.43420148 -1.958366456 -0.7467347
#2 -0.07729196 -0.001524259  0.3066498
#3  0.73913723 -0.265336001 -1.4304858
#4 -1.75860473  1.563222619  1.1944265
#5 -0.06982523  0.342768064 -0.4820681
#6  0.45190553  1.572425400  1.3178624

# Custom function that sums entries from columns
# with names a, b, c
myfunc <- function(df, a, b, c) {
    # Some operation for the three columns, here calculate the sum
    df$d <- df$a + df$b + df$c;
    return(df);
}

df2 <- myfunc(df, "a", "b", "c");
df2;
#            a            b          c          d
#1  1.43420148 -1.958366456 -0.7467347 -1.2708997
#2 -0.07729196 -0.001524259  0.3066498  0.2278336
#3  0.73913723 -0.265336001 -1.4304858 -0.9566846
#4 -1.75860473  1.563222619  1.1944265  0.9990444
#5 -0.06982523  0.342768064 -0.4820681 -0.2091252
#6  0.45190553  1.572425400  1.3178624  3.3421933

For future posts, please take some time reading up on how to ask questions here on SO, and then provide a minimal reproducible example/attempt, including sample data.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68