4

I am new to the purrr package and am trying to iterate through each group in a dataframe to:

  1. Cap the values of a variable (Sepal.Length) to a value xmax (e.g. 5), which is based on a quantile of the data for that group

  2. Set the x-axis labels to be, for e.g. 0, 1, 2, 3, 4, >=5

I have one working method, but cannot get the following to work (note that this has been edited with thanks to the comments by @Jimbou). Columns xmax, xbreaks, xlabels are created, but Sepal.Length is a new column and I would instead like to update data$Sepal.Length.

binwidth <- 1    
graphs <- as_tibble(iris) %>% 
  nest(-Species) %>%
  mutate(xmax = map(data, ~ plyr::round_any(quantile(.$Sepal.Length, 0.975), binwidth)),
         xbreaks =  map(xmax, ~ seq(0, ., binwidth)),
         xlabels =  map(xmax, ~c(seq(0, (. - binwidth), binwidth), paste0(">=", .))),

        Sepal.Length= map2(data, xmax, ~ ifelse(.x$Sepal.Length >= .y, .y, .x$Sepal.Length)),  
        # this creates a new column, want it instead to update column in data
        # a work-around would be to create a dataframe from the new column
        # but I would like to work out how to update columns ... 

        graphs = map2(data, Species, ~ ggplot(., aes(Sepal.Length))) + 
           geom_histogram() + 
           scales_x_continuous(breaks=xbreaks, labels = xlabels) + 
           ggtitle(.y)
  )

Thanks for your help.

user1420372
  • 2,077
  • 3
  • 25
  • 42
  • `Error in round_any.numeric(quantile(.x$Sepal.Length, 0.975), binwidth) : object 'binwidth' not found`, I can't find `binwidth` – A. Suliman Jul 17 '18 at 04:14
  • Apologies for that. Have added `binwidth <-1` to code now – user1420372 Jul 17 '18 at 04:50
  • try `invisible(lapply(graphs$graphs, print))` to stop showing the brackets. Borrowed from [here](https://stackoverflow.com/questions/39401789/producing-ggplots-from-a-loop-and-generating-the-files-without-printing-any-vi) – Roman Jul 17 '18 at 11:55
  • For the first method you can try `as_tibble(iris) %>% nest(-Species) %>% mutate(xmax = map(data, ~ plyr::round_any(quantile(.$Sepal.Length, 0.975), binwidth))) %>% mutate(xbreaks = map(xmax, ~seq(0, ., binwidth))) %>% mutate(labels = map(xmax, ~c(seq(0, (. - binwidth), binwidth), paste0(">=", .)))) %>% mutate(Sepal.Length = map2(data,xmax, ~ ifelse(.x$Sepal.Length >= .y, .y, .x$Sepal.Length)))` – Roman Jul 17 '18 at 12:02
  • Thanks, for method 2 that works. For method 1, the last mutate creates a new column, I am not sure how to get it to update the column in data. Code too long for comment, have added as an edit – user1420372 Jul 18 '18 at 01:16
  • @OP if your second it is a working solution, can you post it as an answer ,accept it and remove the code from your question ? (This way it doesn't look "unanswered" from the outside and is more readable, especially if other answers come in) – moodymudskipper Jul 18 '18 at 10:08

1 Answers1

0

This method works, but does not answer the OP's question as to how to update a column in the nested dataframe.

binwidth <- 1      
graphs <- as_tibble(iris) %>% 
  nest(-Species) %>%
  mutate(graphs = map2(
    data, 
    Species,
    function(.x, .y) 
    {
      xmax <- plyr::round_any(quantile(.x$Sepal.Length, 0.975), binwidth)
      xbreaks <- seq(0, xmax, binwidth)
      xlabels =  c(seq(0, (xmax - binwidth), binwidth), paste0(">=", xmax))
      .x$Sepal.Length = ifelse(.x$Sepal.Length >= xmax, xmax, .x$Sepal.Length)
      graphs = ggplot(.x, aes(Sepal.Length)) + geom_histogram(binwidth = binwidth) + scale_x_continuous(breaks = xbreaks, labels=xlabels) + ggtitle(.y)
      }
  )
  )
invisible(lapply(graphs$graphs, print))

Thanks to @Jimbou for the tip on using invisible()

user1420372
  • 2,077
  • 3
  • 25
  • 42