0

I try get formatting string using python3 regex - re

My input:

{'factorial.2.0.0.zip', 'Microsoft ASP.NET Web API 2.2 Client Libraries 5.2.3.zip', 'Newtonsoft.Json.9.0.1.zip'}

I try get only name and only version for packages, like that:

  • factorial.2.0.0.zip
    • factorial
    • 2.0.0
  • Microsoft ASP.NET Web API 2.2 Client Libraries 5.2.3.zip
    • Microsoft ASP.NET Web API 2.2 Client Libraries
    • 5.2.3

etc. This my code

if diff is not None:
    for values in diff.values():
        for value in values:
            temp = ''
            temp1 = ''
            temp = re.findall('[aA-zZ]+[0-9]*', value) #name pack
            temp1 = re.findall('\d+', value) #version
            print(temp)
            print(temp1)

My wrong output:

 temp:
 ['Microsoft', 'ASP', 'NET', 'Web', 'API', 'Client', 'Libraries', 'zip']
 ['Newtonsoft', 'Json', 'zip']
 ['factorial', 'zip']

temp1:
['2', '0', '0']
['2', '2', '5', '2', '3']
['9', '0', '1']

Right output:

temp:
['Microsoft', 'ASP', 'NET', 'Web', 'API', 'Client', 'Libraries']
['Newtonsoft', 'Json']
['factorial']

temp1:
['2', '0', '0']
['5', '2', '3']
['9', '0', '1']

how me fix problem, delete "zip" is search and extra numbers. Maybe have another way solved my problem.

Cœur
  • 37,241
  • 25
  • 195
  • 267
teror4uks
  • 166
  • 1
  • 11
  • I'd strongly recommend to get rid of meaningless identifiers such as temp, whatever you change else. – guidot Dec 25 '16 at 16:57

1 Answers1

3

Something like this?

import re

a = {'factorial.2.0.0.zip', 'Newtonsoft.Json.9.0.1.zip',\
     'Microsoft ASP.NET Web API 2.2 Client Libraries 5.2.3.zip',\
     'namepack010.0.0.153.212583'}

for b in a:
    c = re.findall('(.*?).(\d+\.\d+\.\d+)(\.zip|\.\d+)$', b)[0]
    if c[2] == '.zip':
        print c[0],'||',c[1]
    else:
        print c[0],'||',c[1]+c[2]

Output:

Newtonsoft.Json || 9.0.1
namepack010 || 0.0.153.212583
Microsoft ASP.NET Web API 2.2 Client Libraries || 5.2.3
factorial || 2.0.0

Don't use [aA-zZ] for selecting all alphabets. It will match some of the special characters also. You should use [a-zA-Z]

Check this for more understanding: Why is this regex allowing a caret?

Community
  • 1
  • 1
Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78
  • thanks man, you realy help, but I find name packgs which is not suitable this regex. He looks like this: `namepack010.0.0.153.212583` your regex return `('namepack010.0.0.153.', '12583')` maybe you can help me again? right return this packs: `('namepack010' , '0.0.153.212583')` – teror4uks Dec 25 '16 at 19:38
  • my solution: `print(re.findall('(.*?\d*\s*)\.*(\d*[^a-zA-Z]*).zip', b)[0])` – teror4uks Dec 25 '16 at 20:29
  • 1
    @teror4uks Modified. Check now. – Mohammad Yusuf Dec 26 '16 at 01:53