1

I have the following problem :

I have a list of tuple representing packages and their version (some packages don't have a specified version so no problem with that) like so :

    ('lib32c-dev', '', '', '')
    ('libc6-i386', '2.4', '', '')
    ('lib32c-dev', '', '', '')
    ('libc6-i386', '1.06', '', '')
    ('libc6-i386', '2.4', '', '')
    ('lib32c-dev', '', '', '')
    ('libc6-i386', '2.16', '', '')
    ('libc6-dev', '', '', '')
    ('', '', 'libc-dev', '')
    ('libc6-dev', '', '', '')
    ('', '', 'libc-dev', '')
    ('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
    ('libc6-dev-x32', '', '', '')
    ('libc6-x32', '2.16', '', '')
    ('libncursesw5-dev', '5.9+20150516-2ubuntu1', '', '')
    ('libc6-dev-x32', '', '', '')
    ('libc6-x32', '2.16', '', '')
    ('libc6-dev-x32', '', '', '')
    ('libc6-x32', '2.16', '', '')
    ('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
    ('libncursesw5-dev', '5.9+20150516-2ubuntu1', '', '')

As you can see, some packages are listed in tuples more than once but with a different version.

What I need is to parse the list of tuple in order to have for each package the most recent version before transforming the list into a dictionary.

PS : The position of the package name and it's version are not fixed. But we can say that the version is always after the package name so can we say that the version will always be at position 1 and 3 ?

Thank you for your help.

Marc
  • 49
  • 10
  • 2
    You can iterate over the list, and put package in dict if and only if its newer version is not already there. – GingerPlusPlus Mar 23 '16 at 19:08
  • Thank you for your comment. But the problem is that I am unable to create a code that can fetch if the version is newer or not... – Marc Mar 23 '16 at 19:09
  • Can you show us a piece of code of you have tried ? – Ilyas Mar 23 '16 at 19:10
  • The only way I can think of is treating the tuple as a list and parse it visually but I can't do that because the position of the package name and it's version (if available) is not fixed – Marc Mar 23 '16 at 19:12

6 Answers6

0

you actually should transform it into a dictionary first

data = {}
for value in my_list:
    data2 = iter(value)
    #find the first non-empty entry in our subtuple, that is our package name
    name = next(d for d in data2 if d)
    version = next(data2,'') # the version is whatever immediatly follows the package name
    data.setdefault(name,[]).append(version)

that will get you 90% of the way there although this counts on the package name being the first element ... which apparently does not always hold true ...

here is one way you could get the version number from a string

def float_version_from_string(version_string):
    try:
       return float(re.findall("\d.?\d*",version_string)[0])
    except (IndexError,ValueError):
       return -1    
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • I have done that but the problem of dict is the following : if you insert a key that already exist, it will modify the old value and put the new one. Moreover, the position of the package and it's version are not stable as you can see in my example. – Marc Mar 23 '16 at 19:16
  • you clearly have not done *this* ... since this will not do what you describe ... but it will break since you dont know where the name or version will be in the tuple – Joran Beasley Mar 23 '16 at 19:18
  • @Marc added some logic to try and (find/guess) the "name" it assumes the name will be the first non-empty value in the tuple, and the version immediatly follows it – Joran Beasley Mar 23 '16 at 19:20
  • Yes I did use a dictionnary in my code as following : dictionnaire_package = dict(Tuple[i:i+2] for i in xrange(0, len(Tuple), 2)) But as I said before in the comments, if the key was already present in the dictionary it's value will be automatically replaced. By executing my code, I get 2 keys and 2 values from each tuple and then I did the cleaning which works fine but the fact of having a replaced version without being able to control the replacement is really the big problem – Marc Mar 23 '16 at 19:22
  • what is that ? that is not part of my answer ... yes you tried __*a*__ dictionary answer ... but that is not at all what my solution is doing ... in fact i have no idea what that code you pasted is doing or what you are expecting it to do ... __and it does not do the same thing as my code at all__! – Joran Beasley Mar 23 '16 at 19:23
  • just quit and try *my code* before arguing with me ... I promise im not an idiot and that this code will __not__ overwrite the keys if they exist... *or dont try it* at this point ive spent all the time im willing to spend on this question – Joran Beasley Mar 23 '16 at 19:29
  • I like the generator approach to retrieving the library and the version (definitely inspired my answer), but determining which version is the newest from a string is still a non-trivial problem. For example, most of the answers here treat `2.4` as newer than `2.16`, making them incomplete if not incorrect. – Jared Goguen Mar 23 '16 at 21:08
  • how exactly do you think 2.4 is not newer than 2.16? (newer == more recent == bigger version number) – Joran Beasley Mar 24 '16 at 00:33
  • Because 16 is greater than 4? Version 2.1.6 is certainly not newer than 2.4. This can be verified [here](http://changelogs.ubuntu.com/changelogs/pool/main/e/eglibc/eglibc_2.19-0ubuntu6.7/changelog). – Jared Goguen Mar 24 '16 at 18:33
  • Would you mind evaluating my answer? It should address the three main aspects of this question: (i) finding the library and version within each tuple, (ii) comparing two versions for "newness", and (iii) filtering the unique entries to a dictionary. – Jared Goguen Mar 25 '16 at 07:30
0

The tricky part is finding a comparison function that reliably determines which version is newer. For example, we want to consider 2.16 as newer than 2.4, but a naive string comparison is insufficient. More so, a float comparison is not only insufficient, it will raise a ValueError when a version cannot be casted to a float.

The type of sorting is desired could be referred to as a "natural sorting" or a "human sorting" and there are some solutions in this question.

An implementation that can be used to compare exactly two values (rather than sorting a list) might look something like:

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def getchunks(s):
    return [tryint(c) for c in re.split('([0-9]+)', s)]

def compare_strings(s1, s2):
    return getchunks(s1) > getchunks(s2)

# 2.4 < 2.16
# 2.4 < 2.4.1
# a_1 < a_2
# and so on...

This can be used in a relatively simple algorithm, using defaultdict to track which libraries have been seen. This assumes that the list of tuples is contained in lib_tuples.

from collections import defaultdict

lib_ver_dict = defaultdict(str)

for lib_tuple in lib_tuples:
    generator = (string for string in lib_tuple if string)
    lib, ver = next(generator), next(generator, '')

    if compare_strings(ver, lib_ver_dict[lib]):
        lib_ver_dict[lib] = ver

The end result is:

'lib32c-dev': ''
'libc6-x32': '2.16'
'libc6-i386': '2.16'
'libncurses5-dev': '5.9+20150516-2ubuntu1'
'libc6-dev': ''
'libc-dev': ''
'libncursesw5-dev': '5.9+20150516-2ubuntu1'
'libc6-dev-x32': ''

Note that compare_strings does not comply with decimal ordering (e.g. 2.001 == 2.1); implementing that detail makes the code far more messy (and is probably irrelevant). Also, if you don't want a case-sensitive comparison, you can update the tryint function to use s.lower() in the final line.

Edit: Your solution should work, but I'd generally recommend not altering a dictionary while iterating over it. Also, zipping keys and values appears to be reliable, but it is easier to call items. Finally, the line del list_versions[:] is nonsensical; it creates a brand new list just to delete it. You could rewrite your function in a more concise way as:

from functools import cmp_to_key

def compare_packages(package_dictionary):
    new_dictionary = {}
    for package, versions in package_dictionary.items():
        version = max(versions, key=cmp_to_key(apt_pkg.version_compare))
        new_dictionary[package] = version or 'Not Specified'
    return new_dictionary
Community
  • 1
  • 1
Jared Goguen
  • 8,772
  • 2
  • 18
  • 36
  • Thank you for your help, I'll try to implement what you have created in my project, I'll keep you posted. – Marc Mar 24 '16 at 18:40
-1

This is just a dummy implementation written on the fly. It was not tested and it should work only if the first element of the tuple is the package name, and second element is it's version. This may not give you the exact solution, but it should definetely help you approach your problem.

my_list_of_tuples = [...]  # some list
my_new_list = []
for tuple in my_list_of_tuples:
    version = float(tuple[1])
    package_name = tuple[0]
    for tuple in my_new_list:
        if tuple[0] == package_name and float(tuple[1]) > version:
            my_new_list.append(tuple)
Laszlowaty
  • 1,295
  • 2
  • 11
  • 19
  • "PS : The position of the package name and it's version are not fixed. But we can say that the version is always after the package name so can we say that the version will always be at position 1 and 3" – Jared Goguen Mar 24 '16 at 18:50
  • Also, this just appends (package, version) tuples that have not been seen or are newer than a previously seen version. It does not take care to remove the old listing. Further, it assumes that versions can be cast to floats, which is not the case as evidence by the sample data. – Jared Goguen Mar 24 '16 at 18:52
-1

You can iterate over the list, and put package in dict if and only if its newer version is not already there:

def version_as_list(s):
    """Converts string symoblizing version to list of integers
    for comparsion purposes."""
    return [int(i) for i in s.split('.')]

data = {}
for name, version, _, _:
    if vesion_as_list(data.get(name, '')) < version_as_list(version):
        data[name] = version
GingerPlusPlus
  • 5,336
  • 1
  • 29
  • 52
  • would you please explain why have done a split on a '.' ? – Marc Mar 23 '16 at 19:32
  • Sorry, I've got an error in the code, see edit. Strings are compared character after character, so `'10.0' < '2.7'`, but `[2, 7] < [10, 0]`, and `[] < any_nonempty_list`. – GingerPlusPlus Mar 23 '16 at 19:42
  • 1
    This doesn't work for versions with [^0-9.] characters or when the name and version aren't the first two entries. – Jared Goguen Mar 24 '16 at 18:43
-1

Using a lot of Python built-in/library code. Seems long solution but really isn't - it's because of the documentation I put inbetween. The code is only 7 lines.

import re, itertools

pkgs = [('libc', '', '', ''), ... ]  # your list of tuples

# a function to extract a version number from a string
rxVSN = re.compile('^(?P<vsn>\d+(\.\d+)?)')
def version(s):
    mo = rxVSN.match(s)
    return float(mo.group('vsn')) if mo is not None else 0.0

# step one: sort the list of tuples by package name and reverse version
# uses built-in sorted() function
#     https://docs.python.org/2/library/functions.html#sorted
pkgs = sorted( pkgs, key = lambda tup: (tup[0], -version(tup[1])) )

# Now we can use the itertools.groupby() function to group the 
# tuples by package name. Then we take the first element of each
# group because that is the one with the highest version number
# (because that's how we sorted them ...)
#    https://docs.python.org/2/library/itertools.html#itertools.groupby
for (pkg, versions) in itertools.groupby( pkgs, key=lambda tup: tup[0]):
    print pkg,": ", next(versions)

Outputs:

 :  ('', '', 'libc-dev', '')
lib32c-dev :  ('lib32c-dev', '', '', '')
libc6-dev :  ('libc6-dev', '', '', '')
libc6-dev-x32 :  ('libc6-dev-x32', '', '', '')
libc6-i386 :  ('libc6-i386', '2.4', '', '')
libc6-x32 :  ('libc6-x32', '2.16', '', '')
libncurses5-dev :  ('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
libncursesw5-dev :  ('libncursesw5-dev', '5.9+20150516-2ubuntu1', '', '')
emvee
  • 4,371
  • 23
  • 23
  • Shouldn't `libc6-i386 : 2.4` be `libc6-i386 : 2.16`? – Jared Goguen Mar 23 '16 at 21:13
  • I thought 2.4 was newer than 2.16. If you check the list both 2.4 and 2.16 appear. There may be _inconsistencies_ in the list but that's a different problem ... – emvee Mar 24 '16 at 05:59
  • Not according to the [change log](http://changelogs.ubuntu.com/changelogs/pool/main/e/eglibc/eglibc_2.19-0ubuntu6.7/changelog). – Jared Goguen Mar 24 '16 at 18:33
  • Also, this won't work on sub-versioned libraries (e.g. 2.4.1), which may or may not be relevant. – Jared Goguen Mar 24 '16 at 18:48
  • Well, again, that is a _different_ problem. And all that's needed is a change in the `version()` function. If anyone can come up with a sensible function that allows random package version numbers to be transformed into a `comparable` type I'd be happy to borrow that. – emvee Mar 25 '16 at 06:32
  • About the change log: I can't help that the list of tuples presented in the question (which is what I fed into the algorithm) is as it is. – emvee Mar 25 '16 at 06:38
  • Well, comparing versions is definitely part of this problem. The main focus of my answer is providing a reliable function for comparing versions; the application of `groupby` or `defaultdict` is then just an implementation detail :) – Jared Goguen Mar 25 '16 at 07:07
-3

I have found the desired solution. I used :

    apt_pkg.version_compare(a,b).

Thank you all.

Function :

    def comparePackages(package_dictionary):
     #loop in keys and values of package_dictionary
        for package_name, list_versions in zip(package_dictionary.keys(), package_dictionary.values()) :
            #loop on each sublist
            for position in xrange(len(list_versions)) :
                a = str(list_versions[position])
                b = str(list_versions[position-1])
                #the only way it worked was by using a and b
                vc = apt_pkg.version_compare(a,b)
                if vc > 0:
                    #a>b
                    max_version = a
                elif vc == 0:
                    #a==b
                    max_version = a         
                elif vc < 0:
                    #a<b
                    max_version = b

            del list_versions[:]
            if(max_version is '') :
                max_version = 'Not Specified'

            package_dictionary[package_name] = max_version

output :

    lib32c-dev : Not Specified
    libc6-x32 : 2.16
    libc6-i386 : 2.16
    libncurses5-dev : 5.9+20150516-2ubuntu1
    libc6-dev : Not Specified
    libc-dev : Not Specified
    libncursesw5-dev : 5.9+20150516-2ubuntu1
    libc6-dev-x32 : Not Specified
Marc
  • 49
  • 10