In my data:
myData='''pos\tidx1\tval1\tidx2\tval2
11\t4\tC\t6\tA
15\t4\tA\t6\tT
23\t4\tT\t6\tT
28\t4\tA\t3\tG
34\t4\tG\t3\tC
41\t4\tC\t4\tT
51\t4\tC\t4\tC'''
I read the this data with header as keys, csv.DictReader.
import csv
import itertools
input_file = csv.DictReader(io.StringIO(myData), delimiter = '\t')
# which produces an iterator
''' Now, I want to group this dictionary by idx2, where
idx2 values is the main key and other have values merged into list that have same keys'''
# This groupby method give me
file_blocks = itertools.groupby(input_file, key=lambda x: x['idx2'])
# I can print this as
for index, blocks in file_blocks:
print(index, list(blocks))
6 [{'val2': 'A', 'val1': 'C', 'idx1': '4', 'pos': '11', 'idx2': '6'}, {'val2': 'T', 'val1': 'A', 'idx1': '4', 'pos': '15', 'idx2': '6'}, {'val2': 'T', 'val1': 'T', 'idx1': '4', 'pos': '23', 'idx2': '6'}]
3 [{'val2': 'G', 'val1': 'A', 'idx1': '4', 'pos': '28', 'idx2': '3'}, {'val2': 'C', 'val1': 'G', 'idx1': '4', 'pos': '34', 'idx2': '3'}]
4 [{'val2': 'T', 'val1': 'C', 'idx1': '4', 'pos': '41', 'idx2': '4'}, {'val2': 'C', 'val1': 'C', 'idx1': '4', 'pos': '51', 'idx2': '4'}]
But, since the output is exhausted I can't print, use it more than once to debug it.
So, problem #1: how to I convert it into non iter-type data.
Problem #2: how can I process this groupby object further to merge the values to a list that have common keys within same group/blocks.
Something like orderedDict, defaultDict where the order of the way the data is read is preserved:
{'6': defaultdict(<class 'list'>, {'pos': [11, 15, 23], 'idx1': [4, 4, 4], 'val1': ['C', 'A', 'T'], 'idx2': [6, 6, 6], 'val2': ['A', 'T', 'T']})}
{'3': .....
{'4': .....
Some of the fixes I tried:
I rather thought I could prepare a keys:[values] by unique keys before grouping:
update_dict = {}
for lines in input_file:
print(type(lines))
for k, v in lines:
update_dict['idx2'] = lines[k,v]
Other thing I tried was to se if I can merge the data inside the grouped object: new_groupBy = {} for index, blocks in file_blocks: print(index, list(blocks)) for x in blocks: for k, v in x: do something for new_groupBy