So I am using tidyr in Rstudio and I am trying to separate the data in the 'player' column (attached below) into 4 individual columns: 'number', 'name','position' and 'school'. I tried using the separate() function, but can't get the number to separate and can't use a str_sub because some numbers are double digits. Does anyone know how to separate this column to the appropriate 4 columns?
Asked
Active
Viewed 134 times
0
-
2Please don't show data as images. – Martin Gal Jun 16 '20 at 21:28
-
2It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data do not count as "reproducible" since we can't copy/paste the data for testing. – MrFlick Jun 16 '20 at 21:38
1 Answers
3
A method using a series of separate
calls.
# Example data
df <- data.frame(
player = c('11Vita VeaDT | Washington',
'16Clelin FerrellEDGE | Clemson',
"17K'Lavon ChaissonEdge | LSU",
'15Cody FordOT | Oklahoma',
'20Derrius GuiceRB',
'1Joe BurrowQB | LSU'))
The steps are:
- separate
school
using|
- separate
number
using the distinction of numbers and letters - separate
position
using capital and lowercase, but starting at the end - cleanup, trim off white space, or extra spaces around the text
df %>%
separate(player, into = c('player', 'school'), '\\|') %>%
separate(player, into = c('number', 'player'), '(?<=[0-9])(?=[A-Za-z])') %>%
separate(player, into = c('last', 'position'), '(?<=[a-z])(?=[A-Z])') %>%
mutate_if(is.character, trimws)
# Results
number name position school
1 11 Vita Vea DT Washington
2 16 Clelin Ferrell EDGE Clemson
3 17 K'Lavon Chaisson Edge LSU
4 15 Cody Ford OT Oklahoma
5 20 Derrius Guice RB <NA>
6 1 Joe Burrow QB LSU

nniloc
- 4,128
- 2
- 11
- 22