I have several tab delimited files that I want to read into dicts using csvDictreader. Each file contains several comment lines starting with '#' or '\t' before the start of actual data. The number of comment lines varies between files. I've been trying the methods outlined in this post but cannot seem to get it working.
Here is my current code:
def load_database_snps(inputFile):
'''This function takes a txt tab delimited input file (in house database) and returns a list of dictionaries for each variant'''
idStore = [] #empty list for storing variant records
with open(inputFile, 'r+') as varin:
idStoreDictgroup = csv.DictReader((row for row in varin if row.startswith('hr', 1, 2)),delimiter='\t') #create a generator; dictionary per snp (row) in the file
idStoreDictgroup.fieldnames = [field.strip() for field in idStoreDictgroup.fieldnames] #strip whitespace from field names
print(type(idStoreDictgroup))
for d in idStoreDictgroup: #iterate over dictionaries in varin_dictgroup
print(d)
idStore.append(d) #attach to var_list
return idStore
Here is an example of an input file:
## SM=Sample,AD=Total Allele Depth, DP=Total Depth
## het;;; and homo;;; are breakdowns of variant read counts per sample - chr1:10002921 T>G AD=34 het:4;11;7;12 (sum=34)
Hetereozygous Homozygous
Chr Start End ref |A| |C| |G| |T| HetCount |A| |C| |G| |T| HomCount TotalCount SampleCount
chr1 10001102 10001102 T 0 0 SM=1;AD=22;DP=38 0 1 0 0 0 0 0 1 138 het:22; homo:-
chr1 10002921 10002921 T 0 0 SM=4;AD=34;DP=63 0 4 0 0 0 0 0 4 138 het:4;11;7;12; homo:-
The lines I want to read in all begin with 'Chr' or 'chr'. I think its not working because I need to iterate over it to reformat the field names using the generator which exhausts it before the rows can be read into the dictionaries.
The error message I get is:
Traceback (most recent call last): File "snp_freq_V1-1_export.py", line 99, in <module> snp_check_wrapper(inputargs.snpstocheck, inputargs.snp_database_location) File "snp_freq_V1-1_export.py", line 92, in snp_check_wrapper snpDatabase = load_database_snps(databaseInputFile) #store database variants in snp_database (a dictionary) File "snp_freq_V1-1_export.py", line 53, in load_database_snps idStoreDictgroup.fieldnames = [field.strip() for field in idStoreDictgroup.fieldnames] #strip whitespace from field names TypeError: 'NoneType' object is not iterable
I have tried doing the inverse of my current code and explicitly excluding rows starting with '#' and '\t'. But this also didn't work and just gives me a blank dictionary.