-2

I have two lists:

a_list contains distinct tupled ids:

a_list = [('1'), ('2'), ('3'), ('4')...etc]

and lists within lists of tuples called check_list:

check_list = [[[('1'), ('2')]], [[('2'), ('3')], [('3'),('4')]], [[('1'),('3')]]...etc]

I've got a problem that is a little too complex for my python skill level... I'm attempting to form a matrix to be outputted to a csv file, with the following structure:

     1 2 3 4 

1 1 1 0 0

2 0 1 2 1

3 1 0 1 0

Where each value is the count of the number from a_list in check_list.

I've looked at numpy and I have no knowledge. I've played around with this but I cant seem to get a full understanding to transfer it to my problem. I'm pretty limited with array knowledge, too. Thanks in advance.

Community
  • 1
  • 1
user47467
  • 1,045
  • 2
  • 17
  • 34
  • Do you actually have tuples in your sublists because as posted the parens are doing nothing? – Padraic Cunningham Feb 16 '16 at 12:24
  • Do you want to tell us anything with the parenthesis around the strings? Are these actually supposed to be tuples? What's the problem, counting or writing to a file? – timgeb Feb 16 '16 at 12:25
  • yes, the parenthesis are supposed to be tuples and are supposed to be there. The problem is the counting. – user47467 Feb 16 '16 at 12:29
  • Then edit your question. As it stands, `a_list` is just a list of strings because `('1') == '1'`. – timgeb Feb 16 '16 at 12:30

1 Answers1

1

You can get the counts using a Counter dict, sort by key to get the counts in Order using an OrderedDict to maintain that order and use dict.viewkeys to find what keys are missing.

from __future__ import print_function
from collections import Counter, OrderedDict
from itertools import chain

check_list = [[[('1',), ('2',)]], [[('2',), ('3',)], [('3',), ('4',)]], [[('1',), ('3',)]]]

a_list = [('1',), ('2',), ('3',), ('4',)]

cn = OrderedDict(sorted(Counter(a_list).items()))

print(" ".join([str(t[0]) for t in cn]))
for chk in check_list:
    _cn = Counter(chain(*chk))
    # cn.keys() python 3
    diff = cn.viewkeys() - _cn
    for k  in cn:
        if k not in diff:
            print(_cn[k], end=" ")
        else:
            print(0, end=" ")
    print()

Output:

1 2 3 4
1 1 0 0 
0 1 2 1 
1 0 1 0 

If you don't care about the order you can remove the sorted/Ordereddict logic:

from collections import Counter
from itertools import chain

check_list = [[[('1',), ('2',)]], [[('2',), ('3',)], [('3',), ('4',)]], [[('1',), ('3',)]]]

a_list = [('1',), ('2',), ('3',), ('4',)]

cn = Counter(a_list)

print(" ".join([str(t[0]) for t in cn]))
for chk in check_list:
    _cn = Counter(chain(*chk))
    diff = cn.viewkeys() - _cn
    for k  in cn:
        if k not in diff:
            print(_cn[k], end=" ")
        else:
            print(0, end=" ")
    print()

That will give you an arbitrary order:

3 4 1 2
0 0 1 1 
2 1 0 1 
1 0 1 0 

To write to a csv:

from collections import Counter, OrderedDict
from itertools import chain
from csv import writer

check_list = [[[('1',), ('2',)]], [[('2',), ('3',)], [('3',), ('4',)]], [[('1',), ('3',)]]]

a_list = [('1',), ('2',), ('3',), ('4',)]

with open("out.csv","w") as out:
    wr = writer(out,delimiter=" ")
    cn = OrderedDict(sorted(Counter(a_list).items()))
    wr.writerow(list(chain(*cn)))
    for chk in check_list:
        _cn = Counter(chain(*chk))
        diff = cn.viewkeys() - _cn
        wr.writerow([_cn[k] if k not in diff else 0 for k in cn])

out.csv:

1 2 3 4
1 1 0 0
0 1 2 1
1 0 1 0
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321