0

I am trying to split the column below using answer First question. For now I am creating the new column in the df by using the letter. I would like to use the Letter before the name as the new column name. In the case below G, D, W, C, UTIL. Since there are only 'spaces' between the category G and the names First Person, etc I am scratching my head as how I could go about seperating the Category G and both the first and last name and join them under the appropriate column.

library(stringr)

test <- data.frame(Lineup = c("G First Person D Another Last W Fake  Name C Test Another UTIL Another Test", "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))

1 G First Person D Another Last W Fake Name C Test Another UTIL Another Test
2 G Fake Name W Another Fake D Third person UTIL Another Name C Name Another

test$G <- str_split_fixed(test$Lineup, " ", 2)

result:

G
G

Hopeful Result:

     G             D            W              C             UTIL    
First Person  Another Last  Fake Name      Test Another  Another Test
Fake Name     Third Person  Another Fake   Name Another  Another Name
Mike.J
  • 117
  • 1
  • 10
  • 1
    Do capital letters always delimit a new name? – zack Oct 15 '18 at 15:37
  • Yes, I updated the result to reflect. G, D, W, C, UTIL would be the letters before the names. The names also have a Capital letter. I apologize for not writing this out that way the first time as I didn't think that was a possible way to split a column – Mike.J Oct 15 '18 at 15:53

1 Answers1

1

Here's one approach using tidyverse:

# example data
test <- data.frame(Lineup = c("G First Person D Another Last W Fake  Name C Test Another UTIL Another Test", 
                              "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))

library(tidyverse)

# create a dataset of words and info about
# their initial row id
# whether they should be a column in our new dataset
# group to join on
dt_words = test %>%
  mutate(id = row_number()) %>%
  separate_rows(Lineup) %>%
  mutate(is_col = Lineup %in% c(LETTERS, "UTIL"),
         group = cumsum(is_col))

# get the corresponding values of your new dataset
dt_values = dt_words %>%
  filter(is_col == FALSE) %>%
  group_by(group, id) %>%
  summarise(values = paste0(Lineup, collapse = " "))

# get the columns of your new dataset
# join corresponding values
# reshape data
dt_words %>%
  filter(is_col == TRUE) %>%
  select(-is_col) %>%
  inner_join(dt_values, by=c("group","id")) %>%
  select(-group) %>%
  spread(Lineup, values) %>%
  select(-id)

#    C            D            G            UTIL            W
# 1  Test Another Another Last First Person Another Test    Fake Name
# 2 Name Another  Third person    Fake Name Another Name Another Fake

Note that the assumption here is that you'll always have a single capital letter to split your values and those capital letter will be your columns in the new dataset.

AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • Will it matter if the names are capital? I apologize I did not capitalize the names in the question as I did not think that was a way of splitting the data – Mike.J Oct 15 '18 at 15:55
  • It's not about the capital letters of the actual names, but that single capital letter between the names that you want as a column in your new dataset. If you run my code step by step you'll see how it works and why they are important :) – AntoniosK Oct 15 '18 at 15:59
  • 1
    Thank you! Can't Believe how easy that made it! – Mike.J Oct 15 '18 at 16:10
  • You should be able to pipe everything together and have one piped chain of commands. But, this one would be easier to debug, if you have any issues in the future, as you can check `dt_words` and `dt_values` and see what you've done up to that point. – AntoniosK Oct 15 '18 at 16:13