1

i have a text file with 3 columns tab separated:
1st column: a gene ID
2nd column: a value
3rd column: a list of genes associated to the one in the 1st column comma separated (number of genes can vary across lines)

TMCS09g1008699 6.4 TMCS09g1008677,TMCS09g1008681,TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686,TMCS09g1008680,TMCS09g1008675,TMCS09g1008690

etc..

what i want is this:

TMCS09g1008699 6.4 TMCS09g1008677
TMCS09g1008699 6.4 TMCS09g1008681
TMCS09g1008699 6.4 TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686
TMCS09g1008690 5.3 TMCS09g1008680
TMCS09g1008690 5.3 TMCS09g1008675
TMCS09g1008690 5.3 TMCS09g1008690

could someone help me?

mightaskalot
  • 167
  • 1
  • 14
  • I would read the strings from the first file into some data structure, like a dictionary and then print the contents from the dictionary out again in a slightly different way. I think Python, Ruby or Javascript would be good languages for a task like this, but that's a matter of taste to some degree. In any case, as it is worded now, your question is a little bit too broad in my opinion. You are basically asking people to write code for you for free. I'd recommend to try to come up with a solution for yourself first and show what you did, and then ask a more specific question. – anothernode Mar 18 '18 at 21:21

2 Answers2

3
$ awk 'BEGIN{FS=OFS="\t"} 
            {n=split($3,f3,","); 
             for(i=1;i<=n;i++) 
               print $1,$2,f3[i]}' file
mightaskalot
  • 167
  • 1
  • 14
karakfa
  • 66,216
  • 7
  • 41
  • 56
1

Here is an R solution using packages from the tidyverse:

library(tidyverse);
df %>%
    mutate(V3 = str_split(V3, ",")) %>%
    unnest();
#              V1  V2             V3
#1 TMCS09g1008699 6.4 TMCS09g1008677
#2 TMCS09g1008699 6.4 TMCS09g1008681
#3 TMCS09g1008699 6.4 TMCS09g1008685
#4 TMCS09g1008690 5.3 TMCS09g1008686
#5 TMCS09g1008690 5.3 TMCS09g1008680
#6 TMCS09g1008690 5.3 TMCS09g1008675
#7 TMCS09g1008690 5.3 TMCS09g1008690

Explanation: str_split column 3 based on ","; expand the resulting list entries with unnest.


Sample data

df <- read.table(text =
    "TMCS09g1008699 6.4 'TMCS09g1008677,TMCS09g1008681,TMCS09g1008685'
TMCS09g1008690 5.3 'TMCS09g1008686,TMCS09g1008680,TMCS09g1008675,TMCS09g1008690'", header = F)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thank you very much. if i well understood from here https://stackoverflow.com/questions/27125672/what-does-function-mean-in-r the %>% is used by tidyverse (imported from dplyr) and just improves the readability of the function. is that correct? – mightaskalot Mar 19 '18 at 15:41
  • Yes, `%>%` is the pipe operator from `magrittr` (imported through `tidyverse`), which you can use to pipe a value forward into an expression/function to chain commands. See [here](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html) for details. – Maurits Evers Mar 19 '18 at 20:21