0

I have a data frame in R to which I would like to add quotation marks at some specific place. One line of this data frame looks like this:

> df
    V1       V2       V3   V4 V5 V6         V7         V8 V9
1 chr9 17025523 17026706 SOX2  .  - ncbiRefSeq transcript  .
                                                        V10
1 gene_id SOX2; transcript_id NM_205188.2;  gene_name SOX2;

I'm interested in the last column (df$V10):

> df$V10
gene_id SOX10; transcript_id NM_205188.2;  gene_name SOX10;

And I would like to add quotations marks around each word in front of the ";". The output would be:

> new_df$V10
gene_id "SOX10"; transcript_id "NM_205188.2";  gene_name "SOX10";

Thanks !

Natha
  • 364
  • 1
  • 3
  • 20
  • Is the vector in question a vector of type character? Or are they R objects? Also, your example does not look like a vector in R (where do the semicolons come from?). Could you give a **runnable** [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610).That way you can help others to help you! – dario Feb 19 '20 at 14:22

2 Answers2

1

You can use a regular expression to replace each word preceding a ; with the word in quotes.

s = 'gene_id SOX10; transcript_id NM_205188.2;  gene_name SOX10;'
str_replace_all(s, '([^[:blank:]]+);', '"\\1";')
# "gene_id \"SOX10\"; transcript_id \"NM_205188.2\";  gene_name \"SOX10\";"
Kent Johnson
  • 3,320
  • 1
  • 22
  • 23
0

Not sure if this is the think you need

r <- gsub("(.*?\\s)(\\w+)(;)","\\1\"\\2\"\\3",v)

such that

> r
[1] "gene_id \"SOX10\"; transcript_id NM_205188.2; gene_name \"SOX10\";"

DATA

v <- 'gene_id SOX10; transcript_id NM_205188.2; gene_name SOX10;'
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81