1

I have a dataframe like so:

df<- data.frame(date= c(rep("10-29-16", 3), rep("11-14-16", 2),
                      "12-29-16","10-2-17","9-2-17"),
                loc= c(rep("A", 3), rep("B", 2),"A","PlotA","PlotB"), 
                obs_network= c(rep("NA", 3), rep("NA", 2),"NA","PlotA","PlotB"))

For obs_network which are NA I want to give them a name for each unique date and loc combo. I would like the unique groups to be assigned a unique number and the prefix "pseudoplot" for this naming scheme. So the output would look like this:

output<- data.frame(date= c(rep("10-29-16", 3), rep("11-14-16", 2),
                      "12-29-16","10-2-17","9-2-17"),
                loc= c(rep("A", 3), rep("B", 2),"A","PlotA","PlotB"), 
                obs_network= c(rep("pseudoplot_1", 3),rep("pseudoplot_2", 2),"pseudoplot_3","PlotA","PlotB"))

I have tried the following without success and I cannot identify my error. Using the code below all the levels read "pseudoplot1". I would greatly appreciate it if someone explained why my code is not working in addition to providing a solution.

output<-
  df %>%
  group_by(date, loc)%>%
  mutate(obs_network=ifelse(is.na(obs_network), 
                      paste0("pseudoplot", "_", match(loc, unique (loc))), 
                             obs_network))
Danielle
  • 785
  • 7
  • 15

2 Answers2

1

This is something I could come up with. There are conditions: 1) date is a date object, and 2) loc and obs_network are character vectors. I create a sample example below. date is a date object, loc and obs_network are character vectors.

         date   loc obs_network
1  2016-10-29     A        <NA>
2  2016-10-29     A        <NA>
3  2016-10-29     A        <NA>
4  2016-11-14     B        <NA>
5  2016-11-14     B        <NA>
6  2016-12-29     A        <NA>
7  2017-10-02 PlotA       PlotA
8  2017-09-02 PlotB       PlotB
9  2017-10-10     A        <NA>
10 2017-10-10     B        <NA>

I used two things. One is that I used differences between two dates. The other is that I used the differences in order to create unique group numbers for unique dates with cumsum(). By pasting unique group numbers and loc, I created unique groups.

mydf %>%
mutate(obs_network = if_else(is.na(obs_network), 
                             paste0("pseudoplot_", cumsum(c(T, abs(diff(date)) > 0)), loc, sep = ""),
                             obs_network))


#         date   loc   obs_network
#1  2016-10-29     A pseudoplot_1A
#2  2016-10-29     A pseudoplot_1A
#3  2016-10-29     A pseudoplot_1A
#4  2016-11-14     B pseudoplot_2B
#5  2016-11-14     B pseudoplot_2B
#6  2016-12-29     A pseudoplot_3A
#7  2017-10-02 PlotA         PlotA
#8  2017-09-02 PlotB         PlotB
#9  2017-10-10     A pseudoplot_6A
#10 2017-10-10     B pseudoplot_6B

DATA

mydf <- structure(list(date = structure(c(17103, 17103, 17103, 17119, 
17119, 17164, 17441, 17411, 17449, 17449), class = "Date"), loc = c("A", 
"A", "A", "B", "B", "A", "PlotA", "PlotB", "A", "B"), obs_network = c(NA, 
NA, NA, NA, NA, NA, "PlotA", "PlotB", NA, NA)), .Names = c("date", 
"loc", "obs_network"), row.names = c(NA, -10L), class = "data.frame")
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • I think there may be a typo in your code where you write `collapese = "" `. Either way thank you very much for your help and suggestion. To help me undertand your method better, would you mind explaining what `cumsum(c(T, abs(diff(date) > 0)))` does? – Danielle Dec 04 '17 at 02:24
  • @Danielle Yeah that was a typo. I am sorry. I also had brackets at a wrong position for cumsum(). `cumsum(c(T, abs(diff(date)) > 0))` is basically creating a group variable. `abs(diff(date)) > 0` generate a logical vector. Whenever a difference between two dates are larger than 0, you get TRUE. `c(T, abs(diff(date)) > 0)` is a logical vector. I added T so that numbering starts with 1. If you have F first, numbering begins from 0. cumsum() is creating a grouping variable. When there is a change from T to F or vice versa, numbering goes up by 1. – jazzurro Dec 04 '17 at 02:38
0

A few notes:

  1. You have included "NA" in your dataframe - so these are text (actually factors) not actually NA values. I recommend changing your original dataframe.

    df <- tibble(date= c(rep("10-29-16", 3), 
                             rep("11-14-16", 2),"12-29-16","10-2-17","9-2-17"),
                loc= c(rep("A", 3), rep("B", 2), "A", "PlotA", "PlotB"), 
                obs_network= c(rep(NA, 6), "PlotA", "PlotB"))
    
  2. There are going to be issues using factors (what you were creating in your database) and character vectors or integers using ifelse. I've change the dataset to a tibble so that everything is a character and am using if_else.

  3. Last don't use a group_by for this simply keep everything flat

    df %>% 
      mutate(obs_network = if_else(is.na(obs_network), 
                           paste0("pseudoplot", "_",  match(paste0(date,loc), unique(paste0(date,loc)))),
                           obs_network))
    
B Williams
  • 1,992
  • 12
  • 19
  • This works wonderfully. Thank you. I did have all my real data converted to characters, so that I already had accounted for. Either way, I did not know you could use the `tibble()` function to convert variables in a df to characters. So that is additionally helpful! Any explanation as to why using `group_by()` doesn't work ? – Danielle Dec 04 '17 at 02:21
  • @Danielle if you use `group_by` then each group will only have a single unique id, therefore all of your pseudoplots will be 1. – B Williams Dec 04 '17 at 16:44