1

My previous question that I thought would extend to my problem wasn't specific enough so I am revisiting again:

My actual data frame has many more columns.

library(tidyverse) 
# not installed in session but needed to reference:
# laeken::gini

df <- data.frame(a1 = c(1:5), 
                 b1 = c(3,1,3,4,6), 
                 c1 = c(10:14), 
                 a2 = c(9:13), 
                 b2 = c(3:7), 
                 c2 = c(15:19))

> df
  a1 b1 c1 a2 b2 c2
1  1  3 10  9  3 15
2  2  1 11 10  4 16
3  3  3 12 11  5 17
4  4  4 13 12  6 18
5  5  6 14 13  7 19

I would like add a column to df using tidyverse's mutate that is the result of the output function my_gini (shown below):

my_gini <- function(some_vector){
  incs = c(1,2,5,9)
  laeken::gini(inc = incs, weights = some_vector)
}

This function needs to take a vector that would be made up of multiple different column values from df defined as my_cols:

my_cols = c("b1","c1", "b2","c2")

I suspect I would need to use purrr here something like:

df %>% 
  mutate(my_g = pmap_dbl(
    select(., my_cols), ~ c(...) %>% 
      {my_gini(.[my_cols])}
    ))

which is supposed to add a column my_g to the df such that the first row would be:

my_gini(c(3,10, 3,15)) # 32.5564

and the second row would be:

my_gini(c(1,11,4,16))  # 29.66243

And so on.

However, it doesn't work. I get an error:

Error: Result 1 is not a length 1 atomic vector

Doing the same action with sum works just fine so I am not sure why it's not working here.

df %>% 
  mutate(my_g = pmap_dbl(
    select(., my_cols), ~ c(...) %>% 
      {sum(.[my_cols])}
    ))

Thank you in advance.

jmb277
  • 558
  • 4
  • 19
  • `my_gini` returns a list, see the difference between `my_gini(as.numeric(df[1,my_cols]))` and `my_gini(as.numeric(df[1,my_cols]))[[1]]`, so your first code works fine just change `{my_gini(.[my_cols])}` to `{my_gini(.[my_cols])[[1]]}` – A. Suliman Aug 02 '19 at 16:12
  • Thank you - that's even cleaner than using `unlist`. – jmb277 Aug 03 '19 at 18:09

1 Answers1

1

Try just using pmap vice pmap_dbl:

df %>% 
  mutate(my_g = unlist(pmap(
    select(., my_cols), ~ c(...) %>% 
      {my_gini(.[my_cols])}
    )))

  a1 b1 c1 a2 b2 c2     my_g
1  1  3 10  9  3 15  32.5564
2  2  1 11 10  4 16 29.66243
3  3  3 12 11  5 17 32.32696
4  4  4 13 12  6 18 33.26741
5  5  6 14 13  7 19  34.8913

pmap_dbl is expecting a numeric input but your function creates an object of the S3 class gini/indicator. When I run it with pmap_dbl I get this warning:

Error: Evaluation error: Result 1 must be a single double, not a vector of class `gini/indicator` and of length 10

So this is getting into some of the more advanced computer programming elements of R, but basically your function creates a type of object which is not native to base R and will not always play nice, as you've discovered, with other functions/packages.

So to get into more detail and why you can't coerce it to numeric, you need to see what your function is actually creating. When you coerce to a character string, this is what you get:

1  list(value = 32.556404997203, valueByStratum = NULL, varMethod = NULL, var = NULL, varByStratum = NULL, ci = NULL, ciByStratum = NULL, alpha = NULL, years = NULL, strata = NULL)
2 list(value = 29.6624331550802, valueByStratum = NULL, varMethod = NULL, var = NULL, varByStratum = NULL, ci = NULL, ciByStratum = NULL, alpha = NULL, years = NULL, strata = NULL)
3 list(value = 32.3269611074489, valueByStratum = NULL, varMethod = NULL, var = NULL, varByStratum = NULL, ci = NULL, ciByStratum = NULL, alpha = NULL, years = NULL, strata = NULL)
4 list(value = 33.2674137552186, valueByStratum = NULL, varMethod = NULL, var = NULL, varByStratum = NULL, ci = NULL, ciByStratum = NULL, alpha = NULL, years = NULL, strata = NULL)
5 list(value = 34.8913043478261, valueByStratum = NULL, varMethod = NULL, var = NULL, varByStratum = NULL, ci = NULL, ciByStratum = NULL, alpha = NULL, years = NULL, strata = NULL)```
Ben G
  • 4,148
  • 2
  • 22
  • 42
  • That runs but all rows in the `my_g` column show ``. – jmb277 Aug 02 '19 at 15:39
  • The above comment is based on the RStudio view of the `df`. If I run in the console it _does_ run with the correct output values and I have no idea what that even means. – jmb277 Aug 02 '19 at 15:40
  • Yeah - if I use the `pmap` as you suggest, each row of `my_g` is an s3 object of type "list" - attempting to coerce `as.double` doesn't work. – jmb277 Aug 02 '19 at 15:49