split numeric and character by multiple delimiter and then make matrix

Question

I have a dataset as follow

1       Saturday,SatAug 13, 2016-5:30 PM
2                  54.362·Robert Madley
3         Sunday,SunAug 14, 2016-1:30 PM
4        11.355 sold out·Andre Marriner

And What I wanna do is to separate dataset by "," or "·" and then make it as matrix or dataframe. In case of line4, 11.355 and "sold out" is also needed to be split. So final dataset should be

date       date1       time           a        f                s
Saturday   SatAug 13   2016-5:30 PM   54.362   Robert Madley
Sunday     SunAug 14   2016-1:30 PM   11.355   Andre Marriner   sold out

[How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) — Sotos, Dec 14 '17 at 09:28
Is one row of the output always two lines of the input data? — A5C1D2H2I1M1N2O1R2T1, Dec 14 '17 at 09:32
Seems like (i) first line is separated by `","` and (ii) second line is separated by `"·"`. Then (iii) column `a` is split to numeric and non-numeric parts. — Heikki, Dec 14 '17 at 09:43

score 0 · Accepted Answer · answered Dec 14 '17 at 17:41

Assuming an observation is always composed of two data rows in the raw data, here's a solution with dplyr + tidyr:

library(dplyr)
library(tidyr)

df %>%
  mutate(ID = c(0, rep(1:(n()-1)%/%2))) %>%
  group_by(ID) %>%
  mutate(ID2 = paste0('V', row_number())) %>%
  spread(ID2, V2) %>%
  separate(V1, c("date", "date1", "time"), sep = ",\\s?") %>%
  extract(V2, c("a", "s", "f"), regex = "^(\\d+\\.\\d+\\b)(\\b.+)?·(.+)", convert = TRUE)

Result:

# A tibble: 2 x 7
# Groups:   ID [2]
     ID     date     date1         time      a         s              f
* <dbl>    <chr>     <chr>        <chr>  <dbl>     <chr>          <chr>
1     0 Saturday SatAug 13 2016-5:30 PM 54.362      <NA>  Robert Madley
2     1   Sunday SunAug 14 2016-1:30 PM 11.355  sold out Andre Marriner

Data:

df = read.table(text = "1|Saturday,SatAug 13, 2016-5:30 PM
2|54.362·Robert Madley
3|Sunday,SunAug 14, 2016-1:30 PM
4|11.355 sold out·Andre Marriner", sep = "|", row.names = 1,
                stringsAsFactors = FALSE)

Thank you very much. It works. I'm gonna practice "mutate" function. — wan, Dec 15 '17 at 02:52
@wan Glad that it works. If you think that this answers your question, don't forget to accept it by clicking on the grey check mark under the downvote button. :) — acylam, Dec 15 '17 at 03:15

split numeric and character by multiple delimiter and then make matrix

1 Answers1