0

Not sure if my problem sound a bit tricky..my requirement is like this: I have three columns of data in txt file as below:

col1,col2,col3/n
11,0.95,21/n
11,0.75,22/n
11,0.85,23/n
11,0.65,24/n
12,0.63,22/n
12,0.75,24/n
12,0.45,25/n
...

col1 can be viewed as dict keys which repeat for <= 5 times, col3 can also viewed as nested dict keys with values in col2, i.e. each key in col1 has <= 5 pairs of (col2: col3).

I would like to sort the nested dictionary by col2 and replace the col2 values with highest ranking, i.e.: I don't care about the values in col2, i only care about the ranking of col3 for each col1 value:

col1,col2,col3
11,1,21/n
11,2,23/n
11,3,22/n
11,4,24/n
12,1,24/n
12,2,22/n
12,3,25/n
...

I tried turning the data into nested dictionaries like:

{col1:{col3:col2}}
{11:{21:0.95,22:0.75,23:0.85,24:0.65},12:{22:0.63,24:0.75,25:0.45}}

I have searched around and found some solutions like sort nested dict etc., but I cannot replace the values with rankings either...can someone please help?

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126

2 Answers2

0

Your input not defined here, I assumed as a list of list like this.

[['col1', 'col2', 'col3'],
 ['11', '0.95', '21'],
 ['11', '0.75', '22'],
 ['11', '0.85', '23'],
 ['11', '0.65', '24'],
 ['12', '0.63', '22'],
 ['12', '0.75', '24'],
 ['12', '0.45', '25']]

Then you can do like this,

result = {}
for i in input_list:
    if i[0] in result:
        result[i[0]].update({i[2]:i[1]})
    else:
        result[i[0]] = {i[2]:i[1]}

Result

{'11': {'21': '0.95', '22': '0.75', '23': '0.85', '24': '0.65'},
'12': {'22': '0.63', '24': '0.75', '25': '0.45'},
'col1': {'col3': 'col2'}}
Rahul K P
  • 15,740
  • 4
  • 35
  • 52
  • this approach does not allowed for the col2 values to be replaced by their rankings but the dictionary creation is nice. – Ma0 Aug 29 '16 at 07:58
  • @Ev.Kounis Actually i assigned the values wrongly for key and values. Now fixed. – Rahul K P Aug 29 '16 at 08:01
  • Python dictionaries do not have as `has_key` method – juanpa.arrivillaga Aug 29 '16 at 08:01
  • @juanpa.arrivillaga It have Read this http://www.tutorialspoint.com/python/dictionary_has_key.htm – Rahul K P Aug 29 '16 at 08:02
  • Sorry, that only applies to Python 3. – juanpa.arrivillaga Aug 29 '16 at 08:03
  • on the subject of rankings again, assuming you have all the col2 values for every unique col1 item in a list called let's say `x`, you can create a mapping like: `ranks = dict(zip(sorted(x), [f+1 for f in range(len(x))]))` and instead of getting `i[1]` in your code, get `ranks[i[1]]` – Ma0 Aug 29 '16 at 08:07
  • @RahulKP [`has_key` is python2 only](http://stackoverflow.com/questions/1323410/has-key-or-in) and you definitely should use the `in` statement. – Sevanteri Aug 29 '16 at 08:09
  • @RahulKP Also, you should be using `setdefault` or better yet, use `defaultdict` from `collections`. That is the standard way of building up a dictionary of containers. It is also faster. – juanpa.arrivillaga Aug 29 '16 at 08:12
  • Thanks Rahul and other people's comments! It is a nice way of creating dictionary! Like @Ev.Kounis mentioned this approach doesn't allow for col2 values to be replaced by their rankings though. – Zhicong Pan Aug 30 '16 at 15:23
0

Well, here is a way to do it in basic Python:

In [90]: col1
Out[90]: [11, 11, 11, 11, 12, 12, 12]

In [91]: col2
Out[91]: [0.95, 0.75, 0.85, 0.65, 0.63, 0.75, 0.45]

In [92]: col3
Out[92]: [21, 22, 23, 24, 22, 24, 25]

Let's create data consisting of items from each columns:

In [163]: data = [*zip(col1, col2, col3)]

In [164]: data
Out[164]: 
[(11, 0.95, 21),
 (11, 0.75, 22),
 (11, 0.85, 23),
 (11, 0.65, 24),
 (12, 0.63, 22),
 (12, 0.75, 24),
 (12, 0.45, 25)]

Let's use itertools module to group them up:

In [174]: import itertools

In [175]: groups = itertools.groupby(data, key=lambda x: x[0])

Now, groups is a generator. If we want to see what it looks like
we will need to iterate it:

for a, b, in groups:
    print(a, list(b))

and we get:

11 [(11, 0.95, 21), (11, 0.75, 22), (11, 0.85, 23), (11, 0.65, 24)]
12 [(12, 0.63, 22), (12, 0.75, 24), (12, 0.45, 25)]

But we exhausted the iterator. So let's create it again, and now
that we know what it contains, we can perform the desired sorting:

In [177]: groups = itertools.groupby(data, key=lambda x: x[0])

In [178]: groups2 = [sorted(list(b), reverse=True) for a, b in groups]

In [179]: groups2
Out[179]: 
[[(11, 0.95, 21), (11, 0.85, 23), (11, 0.75, 22), (11, 0.65, 24)],
 [(12, 0.75, 24), (12, 0.63, 22), (12, 0.45, 25)]]

OK, one more thing, and I do that now in the editor:

for i in range(len(groups2)):
    groups2[i] = [(x, i, z) for i, (x, y, z) in enumerate(groups2[i], 1)]

for g in groups2:
    for item in g:
        print(item)

And we get:

(11, 1, 21)
(11, 2, 23)
(11, 3, 22)
(11, 4, 24)
(12, 1, 24)
(12, 2, 22)
(12, 3, 25)
Israel Unterman
  • 13,158
  • 4
  • 28
  • 35
  • Thanks a lot @Israel Unterman. I was worried that I didn't clearly define my problem (and possibly yes as obviously some people were confused by me - apologize for that). But your method has solved my problem exactly. I am able to resolve my issue using your method now! – Zhicong Pan Aug 30 '16 at 15:20