Split unicode by character into list

Question

I have made a program that reads a selection of names, it is then turned into a Unicode example

StevensJohn:-:
WasouskiMike:-:
TimebombTime:-:
etc

Is there any way to make a list that would split the index so its like

example_list = ["StevensJohn", "WasouskiMike", "TimebombTim"]

This would be dynamic so the number of names and different names would be returned from the web scrape.

Any input would be appreciated.

Code

results = unicode("""
Hospitality
Customer Care
Wick , John 12:00-20:00
Wick , John 10:00-17:00
Obama , Barack 06:00-14:00
Musk , Elon 07:00-15:00
Wasouski , Mike 06:30-14:30
 Production
Fries
Piper , Billie 12:00-20:00
Tennent , David 06:30-14:30
Telsa, Nikola 11:45-17:00
Beverages & Desserts in a Dual Lane Drive-thru with a split beverage cell
Timebomb , Tim 06:30-14:30
Freeman , Matt 08:00-16:00
Cool , Tre 11:45-17:00
Sausage
Prestly , Elvis 06:30-14:30
Fat , Mike 06:30-14:30
Knoxville , Johnny 06:00-14:00
Man , Wee 05:00-12:00
Heartness , Jack 09:00-16:00
Breakfast BOP
Schofield , Phillip 06:30-14:15
Burns , George 06:30-14:15
Johnson , Boris 06:30-14:30
Milliband, Edd 06:30-14:30
Trump , Donald 10:00-17:00
Biden , Joe 08:00-16:00
Tempering & Prep
Clinton , Hillary 11:00-19:00

""")

for span in results:
    results = results.replace(',', '')
    results = results.replace(" ", "")
    results = results.replace("/r","")
    results = results.replace(":-:", "\r")
    results = ''.join([i for i in results if not i.isdigit()])
    print(results)

It is unclear what you are asking. Strings *are* Unicode strings in Python 3. If you have those lines in a file, `open(filename).readlines()` returns them as a list (it's unclear why each has a `:-:` suffix but trimming that off should be trivial, and doesn't seem ho be what you are trying to ask). — tripleee, Aug 22 '20 at 18:15
If it really is, `lines.split(':-:')` splits on that string, but then you have to clean up newlines *before* each item. — tripleee, Aug 22 '20 at 18:18
The Unicode is taken from a web scrape so it doesn't come from a file. I have tried lines.split(':-:') It doesn't produce the output I need. — George Burns, Aug 23 '20 at 10:57
Then show us what you tried, how it's wrong, and what you have done to troubleshoot. Probably also review our guidance for providing a [mre]. — tripleee, Aug 23 '20 at 11:01
The code is now edited in the original post sorry I am very new to coding, it's not the exact data from the web scrape because of data protection but it's very close I just edited the names — George Burns, Aug 23 '20 at 11:56
And the expected output are the names from the lines with time ranges? — tripleee, Aug 23 '20 at 13:05
`unicode` is not a standard class in Python 3. Are you sure you are not using Python 2? — tripleee, Aug 23 '20 at 15:19

Aviv Yaniv · Answer 1 · 2020-08-22T18:19:35.887

0



import re

input = 'StevensJohn:-:\nWasouskiMike:-:\nTimebombTime:-:\n'

class Names:
    def __init__(self, input, delimiter=':-:\n'):
        self.names = [ x for x in re.split(delimiter, input) if x ]
        self.diffrent_names = set(self.names)

    def number_of_names(self):
        return len(self.names)

    def number_of_diffrent_names(self):
        return len(self.diffrent_names)

    def __str__(self):
        return str(self.names)

names = Names(input)
print(names)
print(names.number_of_names())
print(names.number_of_diffrent_names())

edited Aug 22 '20 at 18:19

answered Aug 22 '20 at 18:13

Aviv Yaniv

6,188
3
7
22

Thanks for your reply I will integrate this when I finish my homework for the day and update the thread – George Burns Aug 23 '20 at 11:23

score 0 · Answer 2 · answered Aug 22 '20 at 18:15

0

unicode_ex = 'StevensJohn:-:\nWasouskiMike:-:\nTimebombTime:-:\n'
splitted = [name.replace(" ", "") for name in unicode_ex.split(":-:\n") if name]
print(splitted)

Output

['StevensJohn', 'WasouskiMike', 'TimebombTime']

answered Aug 22 '20 at 18:15

Hasan Salim Kanmaz

426
4
12

TypeError: expected a string or other character buffer object – George Burns Aug 23 '20 at 11:00
could you specify where do you define or import ````unicode````? – Hasan Salim Kanmaz Aug 23 '20 at 13:47
Example above (in question) shows where I have defined unicode – George Burns Aug 23 '20 at 14:49

tripleee · Accepted Answer · 2020-08-23T15:30:16.990

Your edit reveals that this is really an XY problem. Your attempt to successively trim off small substrings will inevitably bump into corner cases where some substrings should not be removed some of the time. A common alternative approach is to use regular expressions.

import re
matches=[''.join([m.group(1), m.group(2)]) for m in re.iterfind(r"([A-Za-z']+)\s*,\s*([A-Za-z'.]+)\s+\d+:\d+-\d+:\d+", results)]

Demo: https://ideone.com/1syge8

A much better solution still is to use the structure of the surrounding HTML to extract only specific spans; most modern web sites use CSS selectors for formatting which also are quite useful for scraping. But since we can't see the original page where you extracted this string, this is entirely speculative.

Split unicode by character into list

3 Answers3