1

In my DataFrame, one the columns has a value that is a combination of [state,country]

enter image description here

I tried this code:

voivodeshipdf <- voivodeshipdf %>% mutate(state =  as.character(unlist(str_split(voivodeship, ','))[1]))

but it only reassigns the value of the first row.

enter image description here

Please how do I update my code to split the right values for each row?

capiono
  • 2,875
  • 10
  • 40
  • 76

2 Answers2

2

An option would be separate

library(tidyverse)
voivodeshipdf %>%
   separate(voivodeship, into = c('state', 'newcol'), sep=",", remove = FALSE) %>%
   select(-newcol)

Or extract

voivodeshipdf %>%
     extract(volvodeship, into = 'state', '^([^,]+),.*', remove = FALSE)

or with word

voivodeshipdf %>%
     mutate(state = word(volvodeship, 1, sep=","))

The issue in the OP's code is that is subsetting the list with [1], which would select the first list element as a list with one vector and it is getting assigned to the column due to recycling

Instead, what we need is to extract the first element from the list output of str_split with map or lapply (map would be more appropriate in tidyverse context)

voivodeshipdf %>% 
        mutate(state =  map_chr(str_split(voivodeship, ','), first))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

We can try using sub here for a base R option:

voivodeshipdf$state <- sub("^.*, ", "", voivodeshipdf$voivodeship)
voivodeshipdf$voivodeship <- sub(",.*$", "", voivodeshipdf$voivodeship)

Sample script:

voivodeship <- "Greater Poland voivodeship, poland"
sub("^.*, ", "", voivodeship)
sub(",.*$", "", voivodeship)

[1] "poland"
[1] "Greater Poland voivodeship"
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360