3

I am splitting strings separated by comma, but, I want to ignore commas between quotations. Here is an example:

library(data.table)
dataset <- data.frame(str=c("USATW,\"USA Technologies, Inc Warrants\",Q" ,
                            "DUSA,DUSA Pharmaceuticals Inc,Q"))

#1   USATW,"USA Technologies, Inc Warrants",Q
#2   DUSA,DUSA Pharmaceuticals Inc,Q

setDT(dataset)[, c("Symbol","Security Name","Market Category") :=
                    tstrsplit(str, ",", fixed=TRUE)]


#   Symbol    Security Name               Market Category
#1  USATW    "USA Technologies            Inc Warrants"
#2  DUSA      DUSA Pharmaceuticals Inc    Q

The first string should be:

#1  USATW    "USA Technologies, Inc Warrants"  Q

There are similar posts but in other programming languages.

Soheil
  • 954
  • 7
  • 20

2 Answers2

6

Try read.table. No packages are needed.

read.table(text = as.character(dataset$str), sep = ",", as.is = TRUE,   
  col.names = c("Symbol", "Security Name", "Market Category"), check.names = FALSE)

giving:

  Symbol                  Security Name Market Category
1  USATW USA Technologies, Inc Warrants               Q
2   DUSA       DUSA Pharmaceuticals Inc               Q
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • right, but I think op wants to preserve the escaped quotes for some reason – rawr Apr 21 '16 at 16:01
  • you can also use `fread` if you sprinkle in line breaks: `fread(paste(dataset$str, collapse = '\n'), header = F)` – eddi Apr 21 '16 at 16:17
3

this regex will split by comma and keep the quotes

library(data.table)
dataset <- data.frame(str=c("USATW,\"USA Technologies, Inc Warrants\",Q" ,
                            "DUSA,DUSA Pharmaceuticals Inc,Q"))

setDT(dataset)[, c("Symbol","Security Name","Market Category") :=
                 tstrsplit(str, '(,)(?=(?:[^"]|"[^"]*")*$)', perl = TRUE)]

#                                         str Symbol                    Security Name Market Category
# 1: USATW,"USA Technologies, Inc Warrants",Q  USATW "USA Technologies, Inc Warrants"               Q
# 2:          DUSA,DUSA Pharmaceuticals Inc,Q   DUSA         DUSA Pharmaceuticals Inc               Q
Community
  • 1
  • 1
rawr
  • 20,481
  • 4
  • 44
  • 78