0

I have the following output with print var:

test.qa.home-page.website.com-3412-jan
test.qa.home-page.website.net-5132-mar
test.qa.home-page.website.com-8422-aug
test.qa.home-page.website.net-9111-jan

I'm trying to find the correct split function to populate below:

test.qa.home-page.website.com
test.qa.home-page.website.net
test.qa.home-page.website.com
test.qa.home-page.website.net

...as well as remove duplicates:

test.qa.home-page.website.com
test.qa.home-page.website.net

The numeric values after "com-" or "net-" are random so I think my struggle is finding out how to rsplit ("-" + [CHECK_FOR_ANY_NUMBER])[0] . Any suggestions would be great, thanks in advance!

KC14
  • 47
  • 3

3 Answers3

2

How about :

import re

output = [
"test.qa.home-page.website.com-3412-jan",
"test.qa.home-page.website.net-5132-mar",
"test.qa.home-page.website.com-8422-aug",
"test.qa.home-page.website.net-9111-jan"
]

trimmed = set([re.split("-[0-9]", item)[0] for item in output])
print(trimmed)
# out : {'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}
Charles Dupont
  • 995
  • 3
  • 9
1

If you have an array of values, and you want to remove duplicates, you can use set.

>>> l = [1,2,3,1,2,3]
>>> l
[1, 2, 3, 1, 2, 3]
>>> set(l)
{1, 2, 3}

You can get to a useful array by str.split('-')[0]-ing every value.

89f3a1c
  • 1,430
  • 1
  • 14
  • 24
0

You could use a regex to parse the individual lines and a set comprehension to uniqueify:

txt='''\
test.qa.home-page.website.com-3412-jan
test.qa.home-page.website.net-5132-mar
test.qa.home-page.website.com-8422-aug
test.qa.home-page.website.net-9111-jan'''

import re 

>>> {re.sub(r'^(.*\.(?:com|net)).*', r'\1', s) for s in txt.split() }
{'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}

Or just use the same regex with set and re.findall with the re.M flag:

>>> set(re.findall(r'^(.*\.(?:com|net))', txt, flags=re.M))
{'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}

If you want to maintain order, use {}.fromkeys() (since Python 3.6):

>>> list({}.fromkeys(re.findall(r'^(.*\.(?:com|net))', txt, flags=re.M)).keys())
['test.qa.home-page.website.com', 'test.qa.home-page.website.net']

Or, if you know your target is always 2 - from the end, just use .rsplit() with maxsplit=2:

>>> {s.rsplit('-',maxsplit=2)[0] for s in txt.splitlines()}
{'test.qa.home-page.website.com', 'test.qa.home-page.website.net'}
dawg
  • 98,345
  • 23
  • 131
  • 206