0

I have a dataset as follow

1       Saturday,SatAug 13, 2016-5:30 PM
2                  54.362·Robert Madley
3         Sunday,SunAug 14, 2016-1:30 PM
4        11.355 sold out·Andre Marriner

And What I wanna do is to separate dataset by "," or "·" and then make it as matrix or dataframe. In case of line4, 11.355 and "sold out" is also needed to be split. So final dataset should be

date       date1       time           a        f                s
Saturday   SatAug 13   2016-5:30 PM   54.362   Robert Madley
Sunday     SunAug 14   2016-1:30 PM   11.355   Andre Marriner   sold out
Sotos
  • 51,121
  • 6
  • 32
  • 66
wan
  • 91
  • 6

1 Answers1

0

Assuming an observation is always composed of two data rows in the raw data, here's a solution with dplyr + tidyr:

library(dplyr)
library(tidyr)

df %>%
  mutate(ID = c(0, rep(1:(n()-1)%/%2))) %>%
  group_by(ID) %>%
  mutate(ID2 = paste0('V', row_number())) %>%
  spread(ID2, V2) %>%
  separate(V1, c("date", "date1", "time"), sep = ",\\s?") %>%
  extract(V2, c("a", "s", "f"), regex = "^(\\d+\\.\\d+\\b)(\\b.+)?·(.+)", convert = TRUE)

Result:

# A tibble: 2 x 7
# Groups:   ID [2]
     ID     date     date1         time      a         s              f
* <dbl>    <chr>     <chr>        <chr>  <dbl>     <chr>          <chr>
1     0 Saturday SatAug 13 2016-5:30 PM 54.362      <NA>  Robert Madley
2     1   Sunday SunAug 14 2016-1:30 PM 11.355  sold out Andre Marriner

Data:

df = read.table(text = "1|Saturday,SatAug 13, 2016-5:30 PM
2|54.362·Robert Madley
3|Sunday,SunAug 14, 2016-1:30 PM
4|11.355 sold out·Andre Marriner", sep = "|", row.names = 1,
                stringsAsFactors = FALSE)
acylam
  • 18,231
  • 5
  • 36
  • 45
  • Thank you very much. It works. I'm gonna practice "mutate" function. – wan Dec 15 '17 at 02:52
  • @wan Glad that it works. If you think that this answers your question, don't forget to accept it by clicking on the grey check mark under the downvote button. :) – acylam Dec 15 '17 at 03:15