1

I have an input table with 3 columns (Person_Id, Visit_Id (unique Id for each visit and each person) and Purpose) as shown below. I would like to generate another new column which provides the immediate preceding visit of the person (ex: if person has visited hospital with Visit Id = 2, then I would like to have another column called "Preceding_visit_Id" which will be 1 (ex:2, if visit id = 5, preceding visit id will be 4). Is there a way to do this in a elegant manner using mutate function?

Input Table

enter image description here

Output Table

enter image description here As you can see that 'Preceding_visit_id' column refers the previous visit of the person which is defined using visit_id column

Please note that this is a transformation for one of the columns in a huge program, so anything elegant would be helpful.

Dput command output is here

structure(list(Person_Id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
3, 3, 3), Visit_Id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14), Purpose = c("checkup", "checkup", "checkup", "checkup", 
"checkup", "checkup", "checkup", "checkup", "checkup", "checkup", 
"checkup", "checkup", "checkup", "checkup"), Preceding_visit_id = c(NA, 
1, 2, 3, 4, NA, 6, 7, 8, 9, 10, NA, 12, 12)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec = 
structure(list(
 cols = list(Person_Id = structure(list(), class = c("collector_double", 
"collector")), Visit_Id = structure(list(), class = c("collector_double", 
"collector")), Purpose = structure(list(), class = 
 c("collector_character", 
"collector")), Preceding_visit_id = structure(list(), class = 
 c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))'''
The Great
  • 7,215
  • 7
  • 40
  • 128
  • 2
    `df %>% group_by(Person_Id) %>% mutate(Preceding_visit_id = lag(Visit_Id, 1))` – JasonAizkalns Mar 26 '19 at 18:35
  • @JasonAizkalns, Can you let me know how can this be done when the visit_id is a sequence number? I have updated the screenshot above for your reference? Can you please help me? – The Great Mar 26 '19 at 18:54
  • 1
    did you try running that code? Where are you getting stuck? Please see [how to make a great R reproducible example](https://stackoverflow.com/q/5963269/2572423) so that others can help. – JasonAizkalns Mar 26 '19 at 19:06
  • I already ran that and found out that the way I provided the sample data was incorrect. For example, The code generated the preceding_visit_id as 5 for person_id = 2 when his first visit was at (visit_id = 6). Though it was grouped, since we have used lag, it picks the last observation (visit_id) of the previous group (person before the current patient) – The Great Mar 26 '19 at 19:26
  • 1
    Pictures of data are impossible to test on. Please update your question sharing data with `dput()` instead. – Gregor Thomas Mar 26 '19 at 20:45
  • Also please clarify what isn't working with Jason's suggestion, it looks perfect. If mutate seems to be ignoring your grouping, then make sure you haven't loaded `plyr` after `dplyr` and ignored the warning that prints. [See this R-FAQ](https://stackoverflow.com/q/26106146/903061). – Gregor Thomas Mar 26 '19 at 20:46
  • Hello! Please find the dput command output updated above – The Great Mar 26 '19 at 21:15
  • This is the order in which I have the packages library(data.table) library(purrr) library(plyr) library(dplyr) library(tibble) # importing necessary packages library(stringr) library(lubridate) library(tidyr) library(tidyverse) library(zoo) – The Great Mar 26 '19 at 21:17
  • Your `person_id` fields don't match in your examples. – william3031 Mar 26 '19 at 22:30
  • Can you please help me with this ? https://stackoverflow.com/questions/56801904/how-to-override-coerce-error-and-create-a-dataframe – The Great Jun 28 '19 at 08:07

1 Answers1

1

The Person_Id fields in your examples don't match.

I'm not sure if this is what you're after, but from your dput() I have created a file that removes the last column:

df_input <- df_output %>% 
  select(-Preceding_visit_id)

Then done this:

df_input %>% 
  group_by(Person_Id) %>% 
  mutate(Preceding_visit_id = lag(Visit_Id))

And the output is this:

# A tibble: 14 x 4
# Groups:   Person_Id [3]
   Person_Id Visit_Id Purpose Preceding_visit_id
       <dbl>    <dbl> <chr>                <dbl>
 1         1        1 checkup                 NA
 2         1        2 checkup                  1
 3         1        3 checkup                  2
 4         1        4 checkup                  3
 5         1        5 checkup                  4
 6         2        6 checkup                 NA
 7         2        7 checkup                  6
 8         2        8 checkup                  7
 9         2        9 checkup                  8
10         2       10 checkup                  9
11         2       11 checkup                 10
12         3       12 checkup                 NA
13         3       13 checkup                 12
14         3       14 checkup                 13
william3031
  • 1,653
  • 1
  • 18
  • 39
  • Hi, I am getting an error message when I use the above suggestion 9in my dataset. The message reads as "Column `preceding_visit_id ` must be length 16 (the group size) or one, not 256612" . Here 256612 is my dataset size and number of unique person Id (groups) are more than 16 as well. Can you Please help as t what is wring here? – The Great Mar 28 '19 at 09:12
  • I'm not sure what the problem is. Is the dataset all in the same format? – william3031 Mar 28 '19 at 11:04
  • Yes, the dataset is in the same format .Can you look into this post? https://stackoverflow.com/questions/55395940/mutate-new-variable-on-one-dataframe-by-deriving-value-from-another-dataframe. – The Great Mar 28 '19 at 11:06
  • Did the `ungroup()` answer in your other post work? It would have been grouped in the input. Sometimes that could be the issue. – william3031 Mar 28 '19 at 11:11
  • Can you please help me understand better? For ex: if my dataframe has categorical data, can it be called as grouped? There have been cases where I followed the same approach for other dataframe for similar 'Code' columns and it worked but for this dataframe it is throwing this error – The Great Mar 28 '19 at 11:15
  • I mean this bit `group_by(Person_Id) %>% `. I wouldn't be able to tell without looking at it. Is there a way of you making the other dataset not confidential? Change some of the data or remove some columns perhaps? – william3031 Mar 28 '19 at 11:18