2

Trying to fetch the data from a csv file This is how my csv file looks:

a,0,b,2,c,6,G,4,l,6,mi,2,m,0,s,4
a,2,b,2,c,0,G,4,l,6,mi,4,m,0,s,6
a,4,b,2,c,6,G,6,l,2,mi,4,m,0,s,0
a,2,b,0,c,2,G,6,l,4,mi,4,m,0,s,6
a,2,b,2,c,6,G,4,l,0,mi,6,m,0,s,4
a,2,b,6,c,0,G,6,l,0,mi,4,m,2,s,4
a,0,b,6,c,4,G,2,l,0,mi,6,m,4,s,2
a,6,b,6,c,4,G,0,l,0,mi,2,m,4,s,2

So, for example in line[0], Depending on the numerical value in line 1,3,5,7,9,11,13,15 I need to get the values in 0,2,4,6,10,12,14

deeper example: from line 1: I need to get

 a,m = 0
b,mi = 2
c,l = 6
G,s =4

Finally, i've to add, which two have the highest combination. so essentially a summation for each.

In order to do this:

# Sanitize filelist to keep only *.csv files    
def sanitize_filelist(filelist):

    sanitized_filelist = []

    # Keep only the log file
    for file in range(len(filelist)):
        if string.lower(filelist[file][-4:]) == '.csv':
            sanitized_filelist += [filelist[file]]
#    print sanitized_filelist
    return sanitized_filelist


def parse_files(dataset_path,file):
    threads = [0,2,4,6,10,12,14]
    coreid  = [1,3,5,7,9,11,13,15]
    cores = [0,2,4,6]
    thread_data = [[],[],[],[],[],[],[]]
    #core = [[],[],[],[],[],[],[]]
        threadcorecount = [[0 for a in range(0,4)] for b in range(0,8)]
    dataset = csv.reader(open(dataset_path, 'rb'), delimiter=',')
    for line in dataset:
        #print line
        for thread in range(len(threads)):
            thread_data[thread] = line[threads[thread]]
        for core in range(len(threads)):
            if line[coreid[core]] == cores[0]:
                sub = core - 1
                print thread_data[sub],cores[0]

I wrote this snippet - still a test version. I am not able to get the values and print. There is no error.. I don't understand what the mistake is.

furins
  • 4,979
  • 1
  • 39
  • 57
pistal
  • 2,310
  • 13
  • 41
  • 65
  • Hi, can you better explain "which two have the highest combination. so essentially a summation for each." I don't get what you expect from this last part: do you want to retrieve the line or the group of letters with the highest sum? – furins Mar 12 '13 at 15:52
  • @furins: thank you. Sure. As we notice in each of the lines the alphabet is the same to the left but the numerical value beside it is different. I need to fetch the data which two numerical values have the highest combination. So, if `a,b = 0` hundered times but `a,c = 2` have combination 99 times.. then it should print out `a,b = 100` `a,c = 99` – pistal Mar 12 '13 at 15:57
  • sure, I'm working on it... it requires some time :) – furins Mar 12 '13 at 16:20

1 Answers1

1

If I've understood all your requests, the following code should do the trick: you can use the results variable if you want to access the values in each line (or save the counter variable somewhere), and the sorted_results to get the count of possible permutations.

some references:

and here is the code:

import csv
from collections import Counter
import operator

def parse_files(dataset_path,f):  # please avoid using reserved words like file
    threads = range(0,16,2)
    dataset = csv.reader(open(dataset_path,'rb'), delimiter=',')
    results = []
    for line in dataset:
        counter = {str(x):[] for x in range(0,8,2)}
        # map(lambda x:counter[line[x+1]].append(line[x]), threads)
        # map(lambda ...) is just a more pythonic way to write the following two lines
        for index in threads:
            counter[line[index+1]].append(line[index])
        # now counter is something like 
        #{'0': ['c', 'l'], '2': ['a', 'm'], '4': ['mi', 's'], '6': ['b', 'G']}

        results.extend([','.join(v)+'='+k for k,v in counter.items()])
        # in results, I'm appending something like this:
        # {'c,l=6', 'a,m=0', 'b,mi=2', 'G,s=4'}

    sorted_results = sorted(dict(Counter(results)).iteritems(), key=operator.itemgetter(1), reverse=True)
    print '\n'.join(['The couple %s appears %d times'%el for el in sorted_results])

    # >>> The couple a,b=2 appears 2 times
    # >>> The couple c,m=4 appears 2 times
    # >>> The couple G,s=4 appears 2 times
    # >>> The couple c,mi=6 appears 1 times
    # >>> The couple a,m=2 appears 1 times
    # >>> ...
Community
  • 1
  • 1
furins
  • 4,979
  • 1
  • 39
  • 57
  • I used it as a short way to define a new function. I've rewritten the code without it – furins Mar 12 '13 at 16:52
  • @furins I would disagree about `lambda` being more pythonic. Concise - yes, functional - sure, obfuscated - you bet, but it does make even a more experienced programmer slow down a bit when trying to debug. Usually, it deserves a comment. – ferrix Mar 12 '13 at 17:06
  • @ferrix it allows me to use map. Using map is pythonic I think, but I agree with your concerns about lambda. I've already replaced it with more "prolix" version – furins Mar 12 '13 at 17:36
  • I think `map` is functional as well. I first met such constructs in functional languages. How would you say that in list comprehension? That is the most pythonic way coming to my mind. – ferrix Mar 12 '13 at 17:45
  • ` counter[line[index+1]].append(line[index])` `KeyError: '-1'` This is the error I get.. – pistal Mar 12 '13 at 18:05
  • @ferrix "pythonic" is somehow a vague definition and usually means idiomatic (I think my code is idiomatic), short (definitely) and understandable (well, not so much...). Since it's only partially pythonic I give up, call it as you wish :) – furins Mar 12 '13 at 18:18
  • It does not give the combinations.. :( – pistal Mar 12 '13 at 18:19
  • @user2015933: the code works to me. Have you changed the `counter[line[index+1]].append(line[index])` part somehow? – furins Mar 12 '13 at 18:22
  • `[('=4', 6344), ('=6', 6344), ('=0', 6344), ('=2', 6344)]` that's how it gives out. – pistal Mar 12 '13 at 18:23
  • No. I've not modified `results.extend([','.join(v)+'='+k for k,v in counter.items()])` `# in results, I'm appending something like this:` `# {'c,l=6', 'a,m=0', 'b,mi=2', 'G,s=4'}` `sorted_results = sorted(dict(Counter(results)).iteritems(), key=operator.itemgetter(1), reverse=True)` `print sorted_results ,f` – pistal Mar 12 '13 at 18:28
  • Ok. I see what the problem is. I've some `-1` in between. and those are creating the issues. – pistal Mar 12 '13 at 18:41
  • my code expects only 0,2,4,6 as possible numeric values (I supposed that from your `cores` variable). If you need to accomodate more possible values (even unknown ones) we can change the code. – furins Mar 12 '13 at 18:45