0

I have a dataframe named data and columns comfort, condition and few others. I do one and the same manipulations with 5 columns and decided to write the following function:

replacing_na_999<-function(df, variable){
  #variable<-as.name(variable)
  levels <- levels(df$variable)
  levels[length(levels) + 1] <- "999"
  df$variable <- factor(df$variable, levels = levels)
  df$variable[is.na(df$variable)] <- "999"
}

When I try:

replacing_na_999(data, comfort)

it returns an error:

 Error in `$<-.data.frame`(`*tmp*`, variable, value = integer(0)) : 
  replacement has 0 rows, data has 44070 

Can someone help me please with syntaxis?

kskirpic
  • 155
  • 1
  • 1
  • 7
  • Possible duplicate: https://stackoverflow.com/questions/7310186/function-in-r-passing-a-dataframe-and-a-column-name – MrFlick Jun 02 '20 at 15:15
  • You can't use `$` with variables that expand to column names.It's better to use `[[ ]]` with strings. – MrFlick Jun 02 '20 at 15:16

2 Answers2

1

this works on my computer

m<-structure(list(district = structure(c(6L, 21L, 20L, 19L, 5L, 8L), 
                                           .Label = c("I", "II", "III", "IV", "IX", "V", "VI", "VII", "VIII", "X", "XI", "XII", "XIII", "XIV", "XIX", "XV", "XVI", "XVII", "XVIII", "XX", "XXI", "XXII", "XXIII"),
                                           class = "factor"), 
                      ln_price = c(5.52146091786225, 4.9416424226093, 4.74493212836325, 5.01063529409626, 4.55387689160054, 5.07517381523383)), 
                 row.names = c(NA, 6L), class = "data.frame")
    m[4,1]<-NA
    m

    m<-sapply(m,function(x) {
      if(is.factor(x))
        factor(x,levels=c(levels(x),999))
      else x
    }
    )

    m[is.na(m)]<-999
    m
  • when I try your code on my dataset I get an error ```Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 44070, 0``` – kskirpic Jun 02 '20 at 17:17
  • try to use dput to post the result that will allow me to see how your dataframe looks like. dput(head(your_dataframe)). I assume you work with a 2d dataframe. – Dimitrios Zacharatos Jun 02 '20 at 17:21
  • that is what dput(head(data)) returns : structure(list(district = structure(c(6L, 21L, 20L, 19L, 5L, 8L), .Label = c("I", "II", "III", "IV", "IX", "V", "VI", "VII", "VIII", "X", "XI", "XII", "XIII", "XIV", "XIX", "XV", "XVI", "XVII", "XVIII", "XX", "XXI", "XXII", "XXIII"), class = "factor"), ... ln_price = c(5.52146091786225, 4.9416424226093, 4.74493212836325, 5.01063529409626, 4.55387689160054, 5.07517381523383)), row.names = c(NA, 6L), class = "data.frame") – kskirpic Jun 02 '20 at 17:49
  • thanks I updated the solution. this works on my computer now, in any case the apply function deletes row names i hope you dont mind that – Dimitrios Zacharatos Jun 02 '20 at 20:21
  • thank you for trying to solve but actually when I indicated `...` before ln_price it means I have 35 other variables and I will look for another solution – kskirpic Jun 03 '20 at 10:45
  • the solution is the same no matter how many variables you have. it tests whether a variable is a factor if yes it adds the level 999 and then if it is NA it converts it to 999. I do not understand why it wouldnt work. this is a recursive function that it will stop only when it finishes reading the dataframe no matter how many collumns – Dimitrios Zacharatos Jun 03 '20 at 11:00
  • I think that there should be another solution that does not include adding information on the structure of all 35 variables – kskirpic Jun 04 '20 at 09:11
0

perhaps something on the line of:

replacing_na_999<-function(df, variable){
  idx <- which(variable == names(df))
  #variable<-as.name(variable)
  levels <- levels(df[idx])
  levels[length(levels) + 1] <- "999"
  df[idx] <- factor(df[idx], levels = levels)
  df[idx][is.na(df[idx])] <- "999"
  return(df)
}

will avoid to pass the name as argument to your function

efz
  • 425
  • 4
  • 9
  • Did you test this? I would think that `df[idx]` should be `df[[idx]]`. Also probably important to show how you are calling this function. Looks like you need to pass strings rather than symbols for the column name as the OP was trying to do. – MrFlick Jun 02 '20 at 15:33