3

I'm having problems sorting a list of strings that contain negative and/or decimal alphanumeric strings. This is what I have so far:

import re

format_ids = ["synopsys_SS_2v_-40c_SS.lib",
              "synopsys_SS_1v_-40c_SS.lib",
              "synopsys_SS_1.2v_-40c_SS.lib", 
              "synopsys_SS_1.4v_-40c_SS.lib",
              "synopsys_SS_2v_-40c_TT.lib",
              "synopsys_FF_3v_25c_FF.lib",
              "synopsys_TT_4v_125c_TT.lib",
              "synopsys_TT_1v_85c_TT.lib",
              "synopsys_TT_10v_85c_TT.lib",
              "synopsys_FF_3v_-40c_SS.lib",
              "synopsys_FF_3v_-40c_TT.lib"]

selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
#key = [2,1,3]
key = 2
produce_groups = False

if isinstance(key, int):
    key = [key]

convert = lambda text: float(text) if text.isdigit() else text
alphanum_key = lambda k: [convert(c) for c in re.split('([-.\d]+)', k)]
split_list = lambda name: tuple(alphanum_key(re.findall(selector,name)[0][i]) for i in key)
format_ids.sort(key=split_list)

print "\n".join(format_ids)

I'm expecting the following output (sorting by the 3rd key):

synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib

But I'm getting the following (all the negative numbers are listed last):

synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib

Now, for the decimals from the 2nd key (changing key variable to 1 (key=1)), I get:

synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib

Expecting:

synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib

Any suggestions are greatly appreciated.

Edit: I ended up using the simpler method described by @StephenRauch:

import re
def sort_names(format_ids, selector, key=1):

    if isinstance(key, int):
        key = [key]

    SELECTOR_RE = re.compile(selector)

    def convert(x):
        try:
            return float(x[:-1])
        except ValueError:
            return x

    def sort_keys(key):
        def split_fid(x):
            x = SELECTOR_RE.split(x)
            return tuple([convert(x[i]) for i in key])
        return split_fid

    format_ids.sort(key=sort_keys(key))

format_ids = ["synopsys_SS_2v_-40c_SS.lib",
              "synopsys_SS_1v_-40c_SS.lib",
              "synopsys_SS_1.2v_-40c_SS.lib",
              "synopsys_SS_1.4v_-40c_SS.lib",
              "synopsys_SS_2v_-40c_TT.lib",
              "synopsys_FF_3v_25c_FF.lib",
              "synopsys_TT_4v_125c_TT.lib",
              "synopsys_TT_1v_85c_TT.lib",
              "synopsys_TT_10v_85c_TT.lib",
              "synopsys_FF_3v_-40c_SS.lib",
              "synopsys_FF_3v_-40c_TT.lib"]

selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
key = [2,1,3]

sort_names(format_ids,selector,key)
Luca
  • 65
  • 7

2 Answers2

1

A big part of your problem is that only actual digits are considered digits, not dashes and periods, so in your code things like "-40".isdigit() or "1.4".isdigit() would be False, and stay as text rather than being converted to floats.

Tané Tachyon
  • 1,092
  • 8
  • 11
1

Need to test for numbers a bit differently, and the re.split() is given a leading '' which was throwing off the convert routine.

Fixed Code:

key = [2,1,3]

def convert(x):
    try:
        return float(x)
    except ValueError:
        return x

alphanum_keys = lambda k: (convert(c) for c in re.split('([-.\d]+)', k))
alphanum_key = lambda k: [i for i in alphanum_keys(k) if i != ''][0]
split_list = lambda name: [
    alphanum_key(re.findall(selector, name)[0][i]) for i in key]
format_ids.sort(key=split_list)

Alternate (simpler) solution:

But... All of those lambdas and regexs, are way more complicated than you need for this problem. How about just:

def sort_key(keys):

    def convert(x):
        try:
            return float(x[:-1])
        except ValueError:
            return x

    def f(x):
        x = x.split('_')
        return tuple([convert(x[i]) for i in keys])
    return f

format_ids.sort(key=sort_key([3, 2, 4]))

How?

sort_keys() returns a function f(). This is a function of one parameter that is passed to sort() to evaluate sort order. The function f() will use the values of keys that are passed to sort_keys() because these are the values available at the time f() is defined. This is called a closure.

Results:

synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
  • @Steven\ Rauch, thanks for the alternate solution, I was trying to implement it but it's not working for me. In my case, I have to split by a user defined regex, so I replaced x.split('_') with x.split(selector). Given all this, where is f(x) called by? – Luca Jul 03 '17 at 18:26
  • 1
    @Kidneys, I added an explanation of closures.. Let me know if that doesn't answer the *where is it being called?* Also be sure to pass `selector` to `sort_keys` if it will be used in `f()` – Stephen Rauch Jul 03 '17 at 18:35
  • @Steven\ Rauch, I completely missed that one, thanks for clarifying. I just updated the code above and getting 'list index out of range error'. Where am I going wrong? – Luca Jul 04 '17 at 17:22
  • 1
    @Kidneys, You are not doing a Regex Split, you are still doing a simple string split. – Stephen Rauch Jul 04 '17 at 17:38
  • @StevenRauch, Just made a minor change to your edit and all is good now, Thanks!! – Luca Jul 04 '17 at 18:09
  • My apologies for misspelling your first name throughout! I'm so used to writing Steven, since that is my son's name. I do have one more question related to this, but I'm not sure if I should open a new thread or continue here. I'll ask here and if it requires a new thread, I will open it... If I were to add another option to the script which allows the user to specify the sort order given a {key:order} paired dictionary. Example, fo = { 0: ['FF', 'TT', 'SS'], 3: ['SS', 'TT', 'FF'] }, how would you go about it? – Luca Jul 04 '17 at 23:21
  • @Kidneys, so one thing to keep in mind is this is not a discussion forum. We don't have threads. The desire is to have stand alone Questions and Answers. The hope is that over time, these questions and answers can be useful to more than just the asker. Since this question has been answered, it is best to formulate a new question. If you like, leave a note here to make it more likely I see the new question. Good Luck. – Stephen Rauch Jul 05 '17 at 00:11
  • "@Stephen", here's the [link](https://stackoverflow.com/questions/44918876/python-sort-by-custom-keyorder-pair) – Luca Jul 05 '17 at 06:51