0

I have problem with separate function and column reference. I'm cleaning some data and colnames and word counts differ alot based on source. Here example:

#First find what is max white space count in each column
col.words <- apply(example, 2, function(x) max(sapply(strsplit(x, "\\s+"), length)))
# Cols to separate (those columns that have more than 1 word)
cols <- col.words[col.words > 1]
#Use Separate to split column into multiple columns
example %>% separate(col=X1, into = paste0("N", 1:cols[1]), sep = "\\s+")

   N1 N2       N3       N4       N5  X2            X3    X4     X5
5   1 82     DOLL Benedikt     <NA> GER 0 0 0 23:27.4   0.0 60 160
6   2 96      BOE Johannes Thingnes NOR 0 0 0 23:28.1  +0.7 54 154
7   3  4 FOURCADE   Martin     <NA> FRA 1 1 2 23:50.5 +23.1 48 148
8   4 77   BAILEY   Lowell     <NA> USA 0 0 0 23:56.9 +29.5 43 143
9   5 81  MORAVEC   Ondrej     <NA> CZE 0 1 1 23:58.1 +30.7 40 140
10  6 40     ANEV Krasimir     <NA> BUL 0 0 0 24:00.9 +33.5 38 138

Problem here is that columns to separate changes depending on data source. I would like to use something like:

example %>% separate(col= names(cols)[1], into = paste0("N", 1:cols[1]), sep = "\\s+")

So I could loop over and colnames or counts could change. Example data below.

#DATA
> dput(example)
structure(list(X1 = c("1 82 DOLL Benedikt", "2 96 BOE Johannes Thingnes", 
"3 4 FOURCADE Martin", "4 77 BAILEY Lowell", "5 81 MORAVEC Ondrej", 
"6 40 ANEV Krasimir"), X2 = c("GER", "NOR", "FRA", "USA", "CZE", 
"BUL"), X3 = c("0 0 0 23:27.4", "0 0 0 23:28.1", "1 1 2 23:50.5", 
"0 0 0 23:56.9", "0 1 1 23:58.1", "0 0 0 24:00.9"), X4 = c("0.0", 
"+0.7", "+23.1", "+29.5", "+30.7", "+33.5"), X5 = c("60 160", 
"54 154", "48 148", "43 143", "40 140", "38 138")), .Names = c("X1", 
"X2", "X3", "X4", "X5"), row.names = 5:10, class = "data.frame")
Hakki
  • 1,440
  • 12
  • 26
  • Can you also show your expected output please? – Sotos Aug 10 '17 at 11:43
  • sure, that "example %>% separate(col=X1, into = paste0("N", 1:cols[1]), sep = "\\s+")" produces correct output, but haven't found how to reference columns in dynamic way – Hakki Aug 10 '17 at 11:46
  • Have a look at the dupe target – Sotos Aug 10 '17 at 11:47
  • cSplit(example, names(example)[1], " "), this one looks like doing the trick. Seems like tidyr answer is bit too complicated. Thanks. Should I delete this post? or keep it alive? – Hakki Aug 10 '17 at 11:56
  • No keep it. It now acts as reference to redirect people to the appropriate answer(s) – Sotos Aug 10 '17 at 12:00

0 Answers0