0

Essentially what I want to write the lines of the document that match the ids list referenced in the code.

nodeIDs.txt:

... has 417 objects of,

10000
10023
1017
1019
1021
1026
1027
1029
...

Adherens junction.txt:

... has 73 lines of,

4301: AFDN; afadin, adherens junction formation factor 
1496: CTNNA2; catenin alpha 2 
283106: CSNK2A3; casein kinase 2 alpha 3 
2241: FER; FER tyrosine kinase 
60: ACTB; actin beta 
1956: EGFR; epidermal growth factor receptor 
56288: PARD3; par-3 family cell polarity regulator 
10458: BAIAP2; BAI1 associated protein 2 
51176: LEF1; lymphoid enhancer binding factor 1 

I'm trying to get the program to go line by line and reference the ids list and if the beginning characters of the line match any of the ones found in the list to write that line to a new document. I was researching data sets, but I was unsure if these would work here.

My code so far:

ids = []
with open('nodeIDs.txt', 'r') as n:
    for line in n:
        ids.append(line)
n.close()

# Import data from the pathway file and turn into a list
g = []
with open('Adherens junction.txt', 'r') as a:
    for line in a:
        g.append(line)
a.close()

aj = open('Adherens.txt', 'a')
for line in a:
    if ids[i] in line:
    aj.write(line)
aj.close()

Can you help me get this working?

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Quintakov
  • 95
  • 1
  • 13
  • This question would be greatly improved with a [Minimal, Complete, and Verifiable](http://stackoverflow.com/help/mcve) example. Specifically, data that works, and doesn't just illustrate the format, and the expected output, from the supplied data. – Stephen Rauch Mar 05 '17 at 02:52

1 Answers1

2

Here is some code which I think does what you are after.

Code:

# read ids file into a set
with open('file1', 'r') as f:
    # create a set comprehension
    ids = {line.strip() for line in f}

# read the pathway file and turn into a list
with open('file2', 'r') as f:
    # create a list comprehension
    pathways = [line for line in f]

# output matching lines
with open('file3', 'a') as f:

    # loop through each of the pathways
    for pathway in pathways:

        # get the number in front of the ':'
        start_of_line = pathway.split(':', 1)[0]

        # if this is in 'ids' output the line
        if start_of_line.strip() in ids:
            f.write(pathway)

Results:

2241: FER; FER tyrosine kinase 
56288: PARD3; par-3 family cell polarity regulator 

file1:

10000
56288
2241

file2:

4301: AFDN; afadin, adherens junction formation factor 
1496: CTNNA2; catenin alpha 2 
283106: CSNK2A3; casein kinase 2 alpha 3 
2241: FER; FER tyrosine kinase 
60: ACTB; actin beta 
1956: EGFR; epidermal growth factor receptor 
56288: PARD3; par-3 family cell polarity regulator 
10458: BAIAP2; BAI1 associated protein 2 
51176: LEF1; lymphoid enhancer binding factor 1 

What is a set comprehension?

This:

# create a set comprehension
ids = {line.strip() for line in f}

is the same as:

# create a set
ids = set()
for line in f:
    ids.add(line.strip())
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
  • This worked perfectly -- thanks for the formatting too! Could you explain a bit further what is happening in the "line for line" and "pathway in pathways" sections of your code? – Quintakov Mar 05 '17 at 18:00
  • 1
    `for line in lines` is a standard python iterator. Many objects (eg: `list`) implement a `__next__` method which allows this very nifty syntax. So it does basically as it reads, it runs the for loop for each line in lines, one at a time. Ain't python fun? You might also be unfamiliar with comprehensions. I updated the post to note the two comprehensions. See: http://stackoverflow.com/questions/1747817/create-a-dictionary-with-list-comprehension-in-python – Stephen Rauch Mar 05 '17 at 18:10