Multiple Separators for the same file input R

Question

I've had a look for answers, but have only found things referring to C or C#. I realise that much of R is written in C but my knowledge of it is non-existent. I am also relatively new to R. I am using the current Rstudio.

This is similar to what I want, I think. Read the data efficiently with multiple separating lines in R

I have a csv file but one variable is a string with values separated by _ and - And I would like to know if there is a package or extra code which does the following on the read. command.

"1","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,218,4,93,1377907200000
"2","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,390,5,157,1377993600000
"3","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,376,5,193,1.37808e+12
"4","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",1,35,1,15,1377907200000
"5","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",12,11258,117,2843,1377993600000
"6","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",5,4659,56,1826,1.37808e+12
"7","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",7,7296,136,2684,1377907200000
"8","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,4533,35,1632,1377907200000
"9","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,421,6,161,1377993600000
"10","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,57,2,23,1.37808e+12

Example row:

Name    Name1   *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000

So it's easy enough to read in

results <- read.delim("~/results", header=F)

but then I still have the string *XYZ_Name3_KB_MobApp_M-18-25_AU_PI

Desired output(separate by _ and by -):

Name    Name1   *XYZ   Name3  KB   MobApp   M 18 25  AU  PI ANDROID 2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000

but not split up the time string.

---- Thanks @Henrik and @AnandaMahto for the code and package. ----

library(splitstackshape)

# split concatenated column by `_`
df4 <- concat.split(data = df3, split.col = "V3", sep = "_", drop = TRUE)

# split the remaining concatenated part by `-`
df5 <- concat.split(data = df4, split.col = "V3_5", sep = "-", drop = TRUE)

I have the option of exporting again to csv and then putting into excel and using text to columns twice. but as I'm on excel 2010 it's with a limited # of rows. — CArnold, Nov 19 '13 at 15:16
Have a look at `str_split` or `stringr::str_split_fixed` and see if that helps. — TheComeOnMan, Nov 19 '13 at 15:20
Ah, so simple. Do you think I should do it it multiple steps then? Rather than on import. — CArnold, Nov 19 '13 at 15:38
you can specify more than one split character in strsplit using regex and | operator, e.g strsplit("*XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID",split="\\_|\\-") — ndr, Nov 19 '13 at 15:57

score 5 · Answer 1 · answered Nov 19 '13 at 15:38

5

I find the functions in package splitstackshape convenient in cases like this.

library(splitstackshape)

# split concatenated column by `_`
results2 <- concat.split(data = results, split.col = "V3", sep = "_", drop = TRUE)

# split the remaining concatenated part by `-`
results3 <- concat.split(data = results2, split.col = "V3_5", sep = "-", drop = TRUE)
results3

answered Nov 19 '13 at 15:38

Henrik

65,555
14
143
159

I'm getting an "Error in FUN(NA_integer_[[1L]], ...) : argument must be coercible to non-negative integer" but thanks for the package I'll have look into making it work. – CArnold Nov 19 '13 at 15:56
OK. Possibly there are some characteristics of your original data which are not represented in the small sample in your question (which works fine, for me). Cheers. – Henrik Nov 19 '13 at 16:01
1

@ChristianArnold, as the package's author, I'd be interested in seeing some actual data that creates this error and the steps to reproduce it. Feel free to do so by [creating an issue at the package's Github issue tracker](https://github.com/mrdwab/splitstackshape/issues?state=open). Thanks! – A5C1D2H2I1M1N2O1R2T1 Nov 19 '13 at 16:04

score 3 · Answer 2 · answered Nov 19 '13 at 15:50

3

library(stringr)

results <- read.delim("~/results", header=F)
results <- cbind(results,str_split_fixed(results$V3, "[_-]", 9))

(this assumes you're OK with having the original column still in place)

answered Nov 19 '13 at 15:50

hrbrmstr

77,368
11
139
205

zx8754 · Accepted Answer · 2017-11-29T10:51:24.290

2

Try this:

# dummy data
df <- read.table(text="
Name    Name1   *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000
Name    Name2   *CCC_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000
", as.is = TRUE)

# replace "_" to "-"
df_V3 <- gsub(pattern="_", replacement="-", df$V3, fixed = TRUE)

# strsplit, make dataframe
df_V3 <- do.call(rbind.data.frame, strsplit(df_V3, split = "-"))

# output, merge columns
output <- cbind(df[, c(1:2)],
                df_V3,
                df[, c(4:ncol(df))])

Building on the comments below, here is another related option, but one which uses read.table instead of strsplit.

splitCol <- "V3"
temp <- read.table(text = gsub("-", "_", df[, splitCol]), sep = "_")
names(temp) <- paste(splitCol, seq_along(temp), sep = "_")
cbind(df[setdiff(names(df), splitCol)], temp)

edited Nov 29 '17 at 10:51

answered Nov 19 '13 at 15:43

zx8754

52,746
12
114
209

@zx8754, two ideas: (1) If you're going to use the `strsplit` approach, use a regular expression and skip the `gsub` step, and maybe just use `do.call(rbind, ...)` since (I *think*) `rbind.data.frame` is slower (and it gives you funky names). (2) If you're going to use the `gsub` approach, forget about `strsplit` and use `read.table(text = df_V3, sep = "-")`. – A5C1D2H2I1M1N2O1R2T1 Nov 19 '13 at 15:57
1

But +1 for an answer that should at least point the OP in the right direction ;-) – A5C1D2H2I1M1N2O1R2T1 Nov 19 '13 at 15:59
I would upvote if I had enough reputation points. But sadly not yet. – CArnold Nov 19 '13 at 16:05
2

@ChristianArnold, Edit your question with some reproducible data and some examples of what you've tried, and people are sure to give you more up-votes on your question, which in turn will let you vote on answers ;-) – A5C1D2H2I1M1N2O1R2T1 Nov 19 '13 at 16:07
@AnandaMahto agree, code is *a bit* messy, intention was to direct the OP in the right direction, feel free to edit. – zx8754 Nov 19 '13 at 16:08
@AnandaMahto, I've put in 10 lines of data, but will that help? Also, I retried with your package and it works good now! :) – CArnold Nov 19 '13 at 16:57

Multiple Separators for the same file input R

3 Answers3

Linked