4

I'm a newbie to python and had a question to ask about vectorizing a code

def makeNames2(nList):
  for nLi in nList:
    nLIdx=[i for i,j in enumerate(nList) if j==nLi]
    if nLIdx.__len__()>1:
        for i,j in enumerate(nLIdx):
            if i>0: nList[j]=nList[j]+str(i)
  return nList

which does the following:

>>> nLTest=['asda','asda','test','ada','test','yuil','test']
>>> print(makenames2(nLTest)
['asda', 'asda1', 'test', 'ada', 'test1', 'yuil', 'test2']

The code works fine, but I was wondering if there is a way to vectorize the for loops?

EDIT

Thanks everyone for all the three answers. This is exactly what I was interested in and would have liked to selected all answers. I can't select more than one, but all of them work.

uday
  • 6,453
  • 13
  • 56
  • 94
  • 4
    Could you explain what you mean by vectorize? – Noel Evans Feb 19 '14 at 14:26
  • possible duplicate of [how do I parallelize a simple python loop?](http://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop) – Jayanth Koushik Feb 19 '14 at 14:27
  • If you mean vectorize as in SSE3 stuff -- not with python out of the box. For some problems, you might be able to do it using 3rd party packages, but even then, it's hard to say when things are actually being vectorized vs. just pushed into a different language (e.g. C) in the implementation. You *can* parallelize it using `multiprocessing` (or sometimes `threading` depending on the problem and python implementation) – mgilson Feb 19 '14 at 14:27
  • Vectorize means run parallely right? – Jayanth Koushik Feb 19 '14 at 14:27
  • 5
    Also, if nLIdx.__len__()>1 can be written as if len(nLIdx)>1 or just if nLIdx – Noel Evans Feb 19 '14 at 14:28
  • vectorize means several things, which is why it'd be helpful if the OP clarifies – Useless Feb 19 '14 at 14:28
  • 3
    @NoelEvans `if len(nLIdx) > 1` (greater than 1, meaning at least 2 elements) is not equivalent to `if nLIdx`. – sebastian Feb 19 '14 at 14:31
  • I meant avoiding a loop – uday Feb 19 '14 at 14:31
  • in Matlab, usually I can do stuff like `nList[nLIdx] = ...`, but in python it gives me an error that I can't use a list to subset (per se) a list or replace to a subset of a list – uday Feb 19 '14 at 14:33
  • your algorithm doesn't lend itself well to vectorization anyways, because the value of element `n` depends on the values of all values `0:n-1` - a parallel solution will end up doing lots of duplicate calculations that are avoided in the below answers. but numpy will let you do that sort of thing, for some sorts of lists (not for string lists, i don't think). – Corley Brigman Feb 19 '14 at 14:49
  • i agree and parallel code would be an overkill. I wanted to reduce two loops to one loop at least, and I am glad I asked this question - I got to learn how to implement the equivalent of Matlab's `find` function in python (i.e. using `dict`). – uday Feb 19 '14 at 14:52

3 Answers3

3
nLTest, items = ['asda','asda','test','ada','test','yuil','test'], {}
for idx, item in enumerate(nLTest):
    nLTest[idx] += str(items.setdefault(item, 0) or "")
    items[item] += 1
print nLTest

Output

['asda', 'asda1', 'test', 'ada', 'test1', 'yuil', 'test2']
thefourtheye
  • 233,700
  • 52
  • 457
  • 497
1

You could simplify it a bit:

def makenames(lst):
    seen = {}
    for index, name in enumerate(lst):
        if name in seen:
            seen[name] += 1
            lst[index] = "{0}{1}".format(name, seen[name])
        else:
            seen[name] = 0
    return lst

This removes one of the for loops, operating in O(n) (dictionary access is O(1)).

Note that this modifies the list in-place; you may wish to have a new output list to append to instead. You could also simplify this slightly using defaultdict or Counter from the collections module.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
1

This is arguably more readable, avoids O(n^2). It's also not in-place.

from collections import defaultdict
def makeNames3(nList):
    counter= defaultdict(lambda:0)
    def posfix(x):
        n= counter[x]
        counter[x]+=1
        return str(n) if n>0 else ""
    return [x+posfix(x) for x in nList]
loopbackbee
  • 21,962
  • 10
  • 62
  • 97