0

How do you match elements from two files and then print the common elements?

file1:

col0 col1 col2 col3 
a     b    someinfo1 metainfo1
c     d    someinfo2 metainfo2
e     f   80    someinfo3 metainfo3

file2:

col0 col1
a    b   

desired ouput

col0 col1 col2 col3
a    b    someinfo1,someinfo2 metainfo1,metainfo2

a and b are integers and in some cases fall within or map to two or more someinfo. If they do, I would write them together like a tuple (that's my guess)

Edit: coordinatefile

  1   4831213 4857551 +

file2

1    4831213 4857551 +   ENSMUSG00000025903  Lypla1
1    4831213 4857551 +   ENSMUSG00000033813  Tcea1

desired output

1    4831213 4857551 +     ENSMUSG00000033813,ENSMUSG00000025903  Tcea1,Lypla
elena
  • 3,740
  • 5
  • 27
  • 38
  • Provide example data files as you really mean them, with literal values. As written with a, b, c, and d, it is not clear why you think someinfo2 should be in the results. – Bennett Brown May 14 '17 at 03:01
  • For an SO question, include something you've tried. Also include the incorrect output or the error traceback. You did a nice job including data files and the desired output. – Bennett Brown May 14 '17 at 03:27
  • I just perform a string match `if col0==col0 and col1==col1` write other elements to file no error so far was wondering how can I combine elements in the case I explained earlier – novicebioinforesearcher May 14 '17 at 03:32
  • This is not an isolated question. Break the problem down into smaller pieces: read the file, perform the JOIN (that's what this operation is called in SQL=Structured Query Language), and then output to a datafile. You have choices for reading the file: I'd recommend `csv` if you're a computer science student and recommend `pandas` if you're a scientist learning informatics skills. If you read it with `pandas`, you'll end up with dataframe objects, and then http://stackoverflow.com/questions/31209908/python-merge-2-lists-sql-join would be relevant. – Bennett Brown May 14 '17 at 03:34
  • ah indeed pandas is what i used `df.loc[(df["col0"] == row["col0"]) & (df["col1"] <= row["col1"]) & (df["col2"] >= row["col2"])]` – novicebioinforesearcher May 14 '17 at 03:37
  • Very good. I'm focusing on showing you to ask a SO question... Include that line in your question, along with minimally executable code. That is, the code in your question should `import pandas`, read the file, and create the dataframe `df`. You can make the question more minimal by omitting the column with Lypla1 and still solve the part you're stuck on, I think. – Bennett Brown May 14 '17 at 03:39
  • Sure, thanks will keep that in mind for future questions but you please help me with the problem I mentioned – novicebioinforesearcher May 14 '17 at 03:42
  • http://stackoverflow.com/questions/43870402/sql-style-inner-join-in-python – Serge May 14 '17 at 03:45
  • `df.loc(value)` will get the column labelled by `value`. http://pandas.pydata.org/pandas-docs/stable/indexing.html . From the code you've provided, it isn't clear how your variables `row` and `df` are related. – Bennett Brown May 14 '17 at 03:51

0 Answers0