-1

For the following dataset, I wrote a function,

expconvert <- function(a) {
     if(a=="h" || a=="H")
         return(100)
     if(a=="k" || a=="K")
         return(1000)
     if(a=="m" || a=="M") 
         return(1000000)
     if(a=="b" || a=="B")
         return(1000000000)
     if(is.numeric(a))
         return(a)
     else
         return(0)
}

The data set looks like following,

CROPDMGEXP CROPDMG PROPDMG PROPDMGEXP
   k         0       20        h
   H         23      41        B
   k         10      5         B  
             2       3         k 
             5       50         

The transformed data set should look like following,

CROPDMGEXP CROPDMG PROPDMG PROPDMGEXP
   1000        0       20        100
   100         23      41        1000000000
   1000        10      5         1000000000  
   0           2       3         1000 
   0           5       50        0 

I wish to apply the above function to the first and the last column. When I write the following code, consider df as the above data frame

df[c(1,4)] <- apply(df[c(1,4)], MARGIN = 1, FUN = expconvert)

I don't get the desired output that is the conversion of the letters in those columns to appropriate numerical weights.

But when I use apply for individual column it works fine as below,

df$CROPDMGEXP <- apply(df[1], MARGIN = 1, FUN = expconvert)

Please help me how do I apply it to both the columns at the same time.

There are many levels in the data set so setNames is cool when there are few. That is why I wrote the function. The question is the function works fine for single column with apply, but returns wrong values when used with multiple columns with apply.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Niranjan Agnihotri
  • 916
  • 2
  • 11
  • 19
  • 1
    For column the margin is 2. Besides, I suppose the function isn't correctly defined to give you correct output. – Ronak Shah Jul 06 '17 at 06:09
  • I tried it @RonakShah but the function returns 0 for every letter. Which it is not supposed to do. – Niranjan Agnihotri Jul 06 '17 at 06:12
  • 1
    Not exactly @RonakShah I checked the question. The thing is there are many levels in the data set so setNames is cool when there are few. That is why I wrote the function. The question is the function works fine for single column with apply but returns wrong values when used with multiple columns with apply. – Niranjan Agnihotri Jul 06 '17 at 06:28
  • 1
    @akrun ok then I'll vote to close as "code not working but no representative example" if your way does not work then indeed we won't be able to find a way to solve the problem as we don't have enough information – Cath Jul 06 '17 at 07:04
  • @akrun have a nice day! – Sotos Jul 06 '17 at 07:08
  • @Sotos You too have a great day! You are a great guy and I know that you are very objective – akrun Jul 06 '17 at 07:10
  • Insert `browser()` at the start of your function and inspect every element. You will have to adjust the function to use two elements, e.g. by `a[1]`, `a[2]`. – Roman Luštrik Jul 06 '17 at 07:47
  • 1
    @akrun you should undelete your answer. It's as good as it gets for the given example. Not your fault if Q isn't representative of a bigger dataset. – Roman Luštrik Jul 06 '17 at 07:48

1 Answers1

2

We can use lapply instead of apply as lapply keeps the same structure of the columns while apply will convert to a matrix and matrix can have only a single class.

df[c(1, 4)] <- lapply(df[c(1, 4)], expconvert)

Also, instead of using the if/else, this can be done easily

v1 <- setNames(c(100, 1000, 1000000, 1000000000), c('h', 'k', 'm', 'b'))
df[c(1, 4)] <- lapply(df[c(1, 4)], function(x) v1[tolower(x)])
df[is.na(df)] <- 0
df
#   CROPDMGEXP CROPDMG PROPDMG PROPDMGEXP
#1       1000       0      20        100
#2        100      23      41 1000000000
#3       1000      10       5 1000000000
#4          0       2       3       1000
#5          0       5      50          0

data

df <- structure(list(CROPDMGEXP = c("k", "H", "k", "", ""), CROPDMG = c(0L, 
23L, 10L, 2L, 5L), PROPDMG = c(20L, 41L, 5L, 3L, 50L), PROPDMGEXP = c("h", 
"B", "B", "k", "")), .Names = c("CROPDMGEXP", "CROPDMG", "PROPDMG", 
"PROPDMGEXP"), class = "data.frame", row.names = c(NA, -5L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Tried using lapply @akrun the problem is that it returns 1000 value for all which is not correct. – Niranjan Agnihotri Jul 06 '17 at 06:13
  • You are right, but there is a gotcha as the data set is very big i did not include all the cases there can also be integers like 1, 3, 32 etc. so we are supposed to directly take them as is. That is the reason why I used the function. It will be a great help if u help me figure out what's going wrong. – Niranjan Agnihotri Jul 06 '17 at 06:20
  • @NiranjanAgnihotri My second code is getting exactly the same output as you intended – akrun Jul 06 '17 at 06:22
  • Yes u are right. It worked for the small data set. But there are around 40 levels in the CROPDMGEXP the function throws the integers as is. with `setNames` i'll have to encode all the levels. how to get around it? – Niranjan Agnihotri Jul 06 '17 at 06:25
  • 1
    @NiranjanAgnihotr We can only code based on what you showed. I don't know what your expectations are when the data showed have only these many levels – akrun Jul 06 '17 at 06:26