0

I have an object lncRNA_lengths like this:

> lncRNA_lengths
# A tibble: 1,071 x 3
  tx_name                   Length Type  
  <chr>                      <int> <chr> 
1 align_id:155048|asmbl_67     205 lncRNA
2 align_id:155049|asmbl_68     228 lncRNA
3 align_id:155143|asmbl_162    524 lncRNA
4 align_id:155148|asmbl_167    344 lncRNA
5 align_id:155226|asmbl_245    386 lncRNA
6 align_id:155265|asmbl_284    825 lncRNA
7 align_id:155270|asmbl_289    292 lncRNA
8 align_id:155331|asmbl_350    216 lncRNA
9 align_id:155332|asmbl_351   1152 lncRNA
10 align_id:155344|asmbl_363    243 lncRNA
# ... with 1,061 more rows

And I want to separate the tx_name column on the "|" symbol. I tried this:

lncRNA_lengths %>% 
  separate(tx_name, c("ID", "asmbl", sep = "\\|"))

But I get this output:

# A tibble: 1,071 x 5
   ID    asmbl `\\|`  Length Type  
   <chr> <chr> <chr>   <int> <chr> 
 1 align id    155048    205 lncRNA
 2 align id    155049    228 lncRNA
 3 align id    155143    524 lncRNA
 4 align id    155148    344 lncRNA
 5 align id    155226    386 lncRNA
 6 align id    155265    825 lncRNA
 7 align id    155270    292 lncRNA
 8 align id    155331    216 lncRNA
 9 align id    155332   1152 lncRNA
10 align id    155344    243 lncRNA
# ... with 1,061 more rows
Warning message:
Expected 3 pieces. Additional pieces discarded in 1071 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...]. 

Three columns are created instead of two, but I don't understand the error message...

Jon
  • 591
  • 2
  • 8
  • 19
  • 2
    I think you mean `separate(tx_name, c("ID", "asmbl"), sep = "\\|")` Note that the `sep=` does not go in the `c()`, it should be in the `separate()`. It's easier to help you if you include your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Dec 10 '18 at 20:09
  • Of course! Didn't notice... Thanks! – Jon Dec 10 '18 at 20:14

1 Answers1

1

This should do it, first make fake data:

df <- data.frame(tx_name = "align_id:155048|asmbl_67",length = 205, type = "lncRNA")

then separate it and create the columns

df <- separate(df, col = tx_name, sep = "\\|", into = c("ID", "asmbl"))

you basically didn't close the vector in into

Derek Corcoran
  • 3,930
  • 2
  • 25
  • 54