Iteratively copy specific rows from CSV file to new file

Question

I have a large tab-delimited csv file with the following format:

#mirbase_acc    mirna_name  gene_id gene_symbol transcript_id   ext_transcript_id   mirna_alignment gene_alignment  mirna_start mirna_end   gene_start  gene_end    genome_coordinates  conservation    align_score seed_cat    energy  mirsvr_score

What I would like to be able to do is iterate through rows and select items based on data (strings) in the "gene_id" field, then copy those rows to a new file.

I am a python noob, and thought it would be a good way to get my feet wet, but it is harder than it looks! I have been trying to use the csv package to manipulate the files, reading and writing basic stuff using dictreader and dictwriter. If anyone can help me out coming up with a template for the iterative searching aspect, I would be greatly indebted. So far I have:

import csv

f = open("C:\Documents and Settings\Administrator\Desktop\miRNA Scripting\mirna_predictions_short.txt", "r")
reader = csv.DictReader(f, delimiter='\t')
writer = open("output.txt",'wb')
writer = csv.writer(writer, delimiter='\t')

Then the iterative bit, bleurgh:

for row in reader:
    if reader.gene_id == str(CG11710):
        writer.writerow

This obviously doesnt work. Any ideas on better ways to structure this??

Welcome to SO! One thing that is helpful to us is when something doesn't work, it's good to describe what the code does and how it is different than what you wanted. "This obviously doesn't work" is a little too vague. What doesn't work? Is there an error printed? If so, what is it? — mgilson, Jul 20 '12 at 17:58
basically, I have no idea what I am doing with that part of the script. That was one attempt i tried, but it didnt go anywhere. In that case "Traceback (most recent call last): File "best3.py", line 10, in if reader.gene_id == str(CG11710): AttributeError: DictReader instance has no attribute 'gene_id'" — Aidan K, Jul 20 '12 at 18:01
I'm not terribly familiar with the `csv` module, but perhaps you want `if row['gene_id'] == str(CG11710): writer.writerow(row)` — mgilson, Jul 20 '12 at 18:05
@mgilson: I don't think `CG11710` is a variable name. He probably wants `if row['gene_id'] == 'CG11710': writer.writerow(row)`. To the OP, I suggest reading or re-reading the official Python tutorial. — Steven Rumbalski, Jul 20 '12 at 18:07
@StevenRumbalski -- You're probably right, but it is a syntatically valid name for a variable, so I left it. I figured that could be sorted out easily enough. (Otherwise, we'd get more comments about a `NameError` which would be easy enough to correct). — mgilson, Jul 20 '12 at 18:09
Alright, thanks everyone for your efforts. I'll hit up the tutorial again to see what Im missing. — Aidan K, Jul 20 '12 at 18:09
@Aidan K. I revise my advice. If you've already read the tutorial, your best course would probably be writing more scripts chock full of errors. Writing code is an excellent way of finding out what you missed (in addition to posting questions here). — Steven Rumbalski, Jul 20 '12 at 18:12

score 4 · Accepted Answer · answered Jul 20 '12 at 18:05

4

You're almost there! The code is nearly correct :)

Accessing dicts goes like this:

some_dict['some_key']

Instead of:

some_object.some_attribute

Creating a string isn't done with str(...) but with quotes like CG11710

In your case:

for row in reader:
    if row['gene_id'] == 'CG11710':
        writer.writerow(row)

answered Jul 20 '12 at 18:05

Wolph

78,177
11
137
148

Error code: Traceback (most recent call last): File "best3.py", line 11, in writer.writerow(row) _csv.Error: sequence expected – Aidan K Jul 20 '12 at 18:10
@AidanK, it seems you are not using the dictwriter, you can find an example on how to use it over here: http://stackoverflow.com/a/2982117/54017 – Wolph Jul 20 '12 at 18:21
@AidanK: Because you used a `DictReader`, you need to pair it with a [`DictWriter`](http://docs.python.org/library/csv.html#csv.DictWriter). The row returned by a `DictReader` is a dictionary. The regular writer object cannot handle the dictionary. – Steven Rumbalski Jul 20 '12 at 18:23

score 0 · Answer 2 · answered Jul 20 '12 at 18:06

Dictionaries in python are addressed like dictionary['key']. So for you it'd be reader['gene_id']. Also strings are declared in quotes "text", not like str(text). str(text) will try to cast whatever is stored in the variable text to a string, which is not what I think you want. Also writer.writerow is a function, and functions take arguments, so you need to do writer.writerow(row).

Iteratively copy specific rows from CSV file to new file

2 Answers2