0

I've implemented an algorithm to to return a list with similar records from bibliographic data.

def Find_top(rank,structure, cp):
    global contadorGrupos
    global contadorValor
    datos=[]
    for clave in rank:
        aux=[]
        vectorName=[]
        if clave["Analizado"]==0:
            for x in structure:
                if clave["Name"]!=x["Name"]:
                    if x["Analizado"]==0:
                        jac=jaccard_similarity(clave["Name"],x["Name"])
                        if jac > 0.4:
                            jar=jellyfish.jaro_winkler(unicode(clave["Name"], 'utf-8'),unicode(x["Name"], 'utf-8'))
                            valor=(jac+jar)/2
                            if valor > 0.5:
                                if cp=="Authors":
                                    if valor> 0.8 and Comparador(clave["Name"], x["Name"])==1:
                                        if Verificar_Key(clave["Afiliation"], x["Afiliation"])>0.7:
                                            aux.append(x)
                                            vectorName.append(x["Name"])
                                            x["Analizado"]=1
                                else:
                                    if cp == "Afiliation":
                                        if valor >= 0.983 :
                                            aux.append(x)
                                            vectorName.append(x["Name"])
                                            x["Analizado"]=1
                                    else:
                                        if valor >= 0.93:
                                            aux.append(x)
                                            vectorName.append(x["Name"])
                                            x["Analizado"]=1
        clave["Analizado"]=1
        aux = aux+BuscarParecidos(vectorName,structure, cp)
        BuscarT(vectorName, rank)
        if len(aux)!=0:
            contadorGrupos = contadorGrupos + 1
            aux.append(clave)
            datos.append(aux)
        contadorValor = contadorValor + 1
    return datos

Where BuscarParecidos is:

def BuscarParecidos(vector, lista , cp):
    vectorAuxiliar=[]
    nombres=[]
    for ax in vector:
        for i in lista:
            if i["Analizado"]==0:
                if not i["Name"] in vector:
                    vx=jaccard_similarity(ax,i["Name"])
                    vy=jellyfish.jaro_winkler(unicode(ax, 'utf-8'),unicode(i["Name"], 'utf-8'))
                    vt=(vx+vy)/2
                    if cp=="Authors":
                        if vt > 0.8 and Comparador(ax,i["Name"])==1:
                            i["Analizado"]=1
                            vectorAuxiliar.append(i)
                            nombres.append(i["Name"])
                    else:
                        if cp == "Afiliation":
                            if vt >= 0.983:
                                i["Analizado"]=1
                                vectorAuxiliar.append(i)
                                nombres.append(i["Name"])
                        else:
                            if vt >= 0.93:
                                i["Analizado"]=1
                                vectorAuxiliar.append(i)
                                nombres.append(i["Name"])
    vector = vector + nombres
    return vectorAuxiliar

jaccard_similarity is my own algorithm and jaro_winkler is implemented by the jellyfish library.

The problem is that when I run the algorithm with Python 3.4 runs perfectly in 40 seconds with 3310 records approximately. But when I run the algorithm with Python 2.7 takes 4 minutes and few seconds to finish. I don't understand why it happens.

200_success
  • 7,286
  • 1
  • 43
  • 74
  • 1
    Possible suspects are `unicode - utf8` handling, and `generators` that are native to python3 (range is a generator), but I saw nothing obvious upon a quick scan of your code. – Reblochon Masque Apr 11 '16 at 15:46
  • Possibly related: http://stackoverflow.com/questions/10901161/python-2-7-or-python-3-for-speed – callyalater Apr 11 '16 at 16:02
  • 1
    Also, [here](https://speakerdeck.com/pyconslides/python-3-dot-3-trust-me-its-better-than-python-2-dot-7-by-dr-brett-cannon) is an analysis as to why Python 3.3 is *"better"* than 2.7. – callyalater Apr 11 '16 at 16:04
  • 1
    Maybe run a profiler? http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script – Matt Hall Apr 11 '16 at 16:06

1 Answers1

-2

yes , i have faced too I would say Ipython is much better for algorithm thing

most of answer you would get here https://www.quora.com/What-is-the-difference-between-IPython-and-Python

best of luck.

remember python 2.7 is like grandfather so it will be slow but more robust and reliable