I have a very large tsv file and need to delete several columns. I've found the CSV module, and an answer as below to a sort of similar question (see script below). Yet I need to delete a large range of columns and don't want to type every single index of each column to delete. Ie from a file with 689513 columns, I'd like to remove columns 628715 through 650181 and also to remove columns 653321 to 689513. (If it's too hard to remove both sets, I can just go with removing the last ones only, ie, 653321 through 689613, or equivalently 653321 to the end of the file). Sorry for the basic question; I'm new to scripting and getting lost ... and the CSV module page doesn't go into detail on deleting column ranges. I tried doing this in R but the first cell entry is blank (see sample list below code). My file is a tsv tab delimited file, but I gather that can be rectified using a command to set the delimiter as \t. Any help is greatly appreciated!!! (Note: unfortunately I need to have colons in the names of my columns, ie 2L:1274 is a altogether the name for one column).
import csv
with open("source","rb") as source:
rdr= csv.reader( source )
with open("result","wb") as result:
wtr= csv.writer( result )
for r in rdr:
wtr.writerow( (r[0], r[1], r[3], r[4]) )
2L:1274 2L:2425 2L:2853 3L:4 3L:5 3L:7
indivBCsusceptiblePL7A10_TATAGT NA NA NA NA NA NA
indivBCsusceptiblePL7A11_CCTGAA NA 5 NA NA NA NA
indivBCsusceptiblePL7A12_CAATAT NA NA 6 7 8 9
indivBCsusceptiblePL7A1_CCGAAT NA NA NA NA NA NA