I want to create a new column in rstudio through mutate function using data from multiple columns

Question

I have a dataset that has over 10 million variables. It has 25 columns for diagnosis code. Each row represents one patient. Based on admission to the hospital Diagnosis code is generated and it can be placed anywhere between columns 1 to 25. I would like to create a new column for lets say disease A that has codes ranging from 480,481,482,483,484,V334,V555. If any of these codes are present in these 25 columns new column should return value of 1 else 0.

Original Table:

DX1	DX5.	DX6.	DX10.
481		482
			V334
	484
		485.
V555			481
F666
	G765		481

I want new column with 0 or 1 variable even if one of the row has values: 480,481,482,483,484,V334,V555. For e.g.

DX1	DX5.	DX6.	DX10.	DX11.	NewCol
481		482			1
			V334	0
	484			1
		485.		1
V555			481	1
F666				0
	G765		481	1

I have tried using:

%>%mutate(NewCol = recode(DX1,
                        "486" = "1",
                        .default = "0"))

but this is only for column DX1 and variable 486. I want to apply for 25 consecutive columns ranging from DX1:DX25 and include all variables 480,481,482,483,484,V334,V555. Thank You!

Please provide a reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — william3031, Dec 15 '20 at 23:51
Use `tidyr::pivot_longer()` to reshape your data set such that those 25 columns become one column, than use `dplyr::case_when()` to recode and group the conditions. — Phil, Dec 16 '20 at 00:45
You should provide a small example of how your data looks along with expected output. See [how to give a reproducible example](http://stackoverflow.com/questions/5963269). — Ronak Shah, Dec 16 '20 at 01:39
Instead of posting your code as an answer, please edit your question and delete the answer. — AnilGoyal, Dec 16 '20 at 07:46

score 0 · Answer 1 · answered Dec 20 '20 at 14:56

In the future, please provide a reproducible example. As to your question, I'd follow @Phil's recommendation and pivot to long-form data after adding an index for patient number. Then you just need to group by row and choose the maximum (it'll be either 1 or 0), and re-merge with the original data.

library(tidyverse)

#Read In Data
df <- data.frame(
  patient = c(1:7),
  dx1 = c('481',NA,NA,NA,'V555','F666',NA),
  dx2 = c(NA,NA,NA,NA,NA,NA,NA),
  dx3 = c(NA,NA,NA,NA,NA,NA,NA),
  dx4 = c(NA,NA,NA,NA,NA,NA,NA),
  dx5 = c(NA,NA,'484',NA,NA,NA,NA),
  dx6 = c('482',NA,NA,'485',NA,NA,NA),
  dx7 = c(NA,NA,NA,NA,NA,NA,NA),
  dx8 = c(NA,NA,NA,NA,NA,NA,NA),
  dx9 = c(NA,NA,NA,NA,NA,NA,NA),
  dx10 = c(NA,'V334',NA,NA,'481',NA,'481')
)

# Declare variables to count
valList = c('481','482','483','484','V334','V555')

#Count, Group, and Rejoin
df %>%
  pivot_longer(cols = dx1:dx10,
               names_to = 'columns',
               values_to = 'values') %>%
  mutate(in_valList = ifelse(values %in% valList,1,0)) %>%
  group_by(patient) %>% 
  summarise(NewCol = max(in_valList,na.rm = T)) %>%
  left_join(df,by='patient')

The output ends up looking like this, which you could reorder however you like if you absolutely must have NewCol at the end.

# A tibble: 7 x 12
  patient NewCol dx1   dx2   dx3   dx4   dx5   dx6   dx7   dx8   dx9   dx10 
    <int>  <dbl> <chr> <lgl> <lgl> <lgl> <chr> <chr> <lgl> <lgl> <lgl> <chr>
1       1      1 481   NA    NA    NA    NA    482   NA    NA    NA    NA   
2       2      1 NA    NA    NA    NA    NA    NA    NA    NA    NA    V334 
3       3      1 NA    NA    NA    NA    484   NA    NA    NA    NA    NA   
4       4      0 NA    NA    NA    NA    NA    485   NA    NA    NA    NA   
5       5      1 V555  NA    NA    NA    NA    NA    NA    NA    NA    481  
6       6      0 F666  NA    NA    NA    NA    NA    NA    NA    NA    NA   
7       7      1 NA    NA    NA    NA    NA    NA    NA    NA    NA    481

I want to create a new column in rstudio through mutate function using data from multiple columns

1 Answers1