-2

I have a dataset called KID, with a column STRATA with a 4-digit numbers(nnnn). Each digit signifies different characteristics( geography, size, type etc) of the hospital depending on the position of number and its value. For example:

KID$STRATA <- c(4231, 2321, 3133, 2112, 3212)

1st Digit = Geographic location : Northeast (1), Midwest (2), South (3), West (4)

2nd Digit = Control: Government (1), Private, not-for-profit (2), Private, investor-owned (3), Private, either not-for-profit or investor-owned (4)

3rd Digit = Location / Teaching: Rural (1), Urban nonteaching (2), Urban teaching (3)

4th Digit = Bedsize: Small (1), Medium (2), Large (3)

Is there a way to separate each digit(first digit into a new column, second digit into a different column, similarly for third and fourth) into different columns and rename them according to their characteristics?

Laxmi
  • 21
  • 4

3 Answers3

2

Using strsplit, you can split string on each character and create new columns.

cols <- c('geography', 'Control', 'Location', 'Bedsize')
KID[cols] <- do.call(rbind, strsplit(as.character(KID$STRATA), '')) 
KID

#  STRATA geography Control Location Bedsize
#1   4231         4       2        3       1
#2   2321         2       3        2       1
#3   3133         3       1        3       3
#4   2112         2       1        1       2
#5   3212         3       2        1       2

Or using splitstackshape :

splitstackshape::cSplit(KID, 'STRATA', '', stripWhite = FALSE, drop = FALSE)

data

KID <- data.frame(STRATA = c(4231, 2321, 3133, 2112, 3212))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

You do not need regular expressions for this. Use 'tidyr' and 'dplyr'. The 'separate()' function in 'tidyr' can take be supplied a vector of character positions to split at.

library(dplyr)
library(tidyr)

KID %>% separate(col = STRATA, sep = 1:4, 
                 into = c("Region", "Control", "Location_Teaching", "Bedsize"))
  Region Control Location_Teaching Bedsize
1      4       2                 3       1
2      2       3                 2       1
3      3       1                 3       3
4      2       1                 1       2
5      3       2                 1       2
Ben Norris
  • 5,639
  • 2
  • 6
  • 15
2

data.table solution

library( data.table )
KID <- data.table( STRATA = c(4231, 2321, 3133, 2112, 3212) )
cols = c("Geographic_location", "Control", "Location_Teaching", "Bedsize" )
KID[, (cols) := tstrsplit( STRATA, "" ) ]


#    STRATA Geographic_location Control Location_Teaching Bedsize
# 1:   4231                   4       2                 3       1
# 2:   2321                   2       3                 2       1
# 3:   3133                   3       1                 3       3
# 4:   2112                   2       1                 1       2
# 5:   3212                   3       2                 1       2
Wimpel
  • 26,031
  • 1
  • 20
  • 37