14
variable=";CREATEDBY~string~1~~72~0~0~0~~~0;CREATEDBYNAME~string~1~~800~0~0~0~~~1;CREATEDBYYOMINAME~string~1~~800~0~0~0~~~2;CREATEDON~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~3;CREATEDONUTC~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~4"

How can I split each value which is between (; and ~)?

The result will be like CREATEDBY,CREATEDBYNAME,CREATEDBYYOMINAME,...

I have tried the below, but it's giving the first occurrence.

variable[variable.find(";")+1:myString.find("~")]

How do I get the list of strings by using the split?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Vicky
  • 819
  • 2
  • 13
  • 30
  • use split function . mystring.split(';') then mystirng.split('~') – Hassan ALi Dec 18 '18 at 08:37
  • 3
    Possible duplicate of [How to split a string into a list?](https://stackoverflow.com/questions/743806/how-to-split-a-string-into-a-list) Note that the second answer indicates how to specify the delimiter. – jpmc26 Dec 18 '18 at 18:14

5 Answers5

21

Using str.split

Ex:

variable=";CREATEDBY~string~1~~72~0~0~0~~~0;CREATEDBYNAME~string~1~~800~0~0~0~~~1;CREATEDBYYOMINAME~string~1~~800~0~0~0~~~2;CREATEDON~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~3;CREATEDONUTC~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~4"

for i in variable.strip(";").split(";"):
    print(i.split("~", 1)[0])
#or
print([i.split("~", 1)[0] for i in variable.strip(";").split(";")])

Output:

CREATEDBY
CREATEDBYNAME
CREATEDBYYOMINAME
CREATEDON
CREATEDONUTC

['CREATEDBY', 'CREATEDBYNAME', 'CREATEDBYYOMINAME', 'CREATEDON', 'CREATEDONUTC']
Rakesh
  • 81,458
  • 17
  • 76
  • 113
14

We can try using re.findall with the pattern ;(\w+)(?=~):

variable = ";CREATEDBY~string~1~~72~0~0~0~~~0;CREATEDBYNAME~string~1~~800~0~0~0~~~1;CREATEDBYYOMINAME~string~1~~800~0~0~0~~~2;CREATEDON~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~3;CREATEDONUTC~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~4"
result = re.findall(r';(\w+)~', variable)
print(result)

['CREATEDBY', 'CREATEDBYNAME', 'CREATEDBYYOMINAME', 'CREATEDON', 'CREATEDONUTC']
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • It's rejecting if the COLUMN name has '_' (i.e CREATED_BY) – Vicky Dec 18 '18 at 09:11
  • 2
    @Vicky Then use `\w+`, which includes underscores, to match your words. – Tim Biegeleisen Dec 18 '18 at 09:21
  • @Vicky: good observation. That's why I hate Regex. You never know what they are really doing and what their intension was. It's very easy to get them wrong accidentally. – Thomas Weller Dec 18 '18 at 10:26
  • 1
    @Thomas I'm not sure that's a reason to hate Regex. `[A-Z]` should be self-explanatory that it isn't going to match underscores, and the question could have been clearer that underscores should be matched. – grg Dec 18 '18 at 17:10
  • @ThomasWeller while I'm sure you can find [degenerate cases](https://www.regextester.com/94502) that I would totally agree with you on, the regex in question is so utterly trivial that any experienced programmer should be able to immediately understand what it does, even if their not familiar with python's particular flavor of regex. And as for getting it wrong, blaming the choice of tool for code not satisfying an incomplete spec is something I just can't get behind. – Jared Smith Dec 18 '18 at 17:15
  • 1
    @JaredSmith: trivial or not, I can't judge. The spec is not so bad regarding the underscore: between `;` and `~`, it did not say that there should be a word (`\w`). So a `[^~]` would be closer to the spec. Next, I wonder why he says the pattern should be `;(\w+)(?=~)` but then uses `;(\w+)~` in the code. – Thomas Weller Dec 18 '18 at 18:51
5

You can split() the string and then find() the first ~ for each one:

variable=";CREATEDBY~string~1~~72~0~0~0~~~0;CREATEDBYNAME~string~1~~800~0~0~0~~~1;CREATEDBYYOMINAME~string~1~~800~0~0~0~~~2;CREATEDON~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~3;CREATEDONUTC~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~4"
result = [item[:item.find('~')]  for item in  variable.split(';')]

print(result)
fixatd
  • 1,394
  • 1
  • 11
  • 19
5

Use regular expression with lookahead and lookbehind:

>>> import re
>>> re.findall(r'(?<=;).*?(?=~)', variable)
['CREATEDBY', 'CREATEDBYNAME', 'CREATEDBYYOMINAME', 'CREATEDON', 'CREATEDONUTC']
Martin Frodl
  • 667
  • 4
  • 11
1
import re

variable = ";CREATEDBY~string~1~~72~0~0~0~~~0;CREATEDBYNAME~string~1~~800~0~0~0~~~1;CREATEDBYYOMINAME~string~1~~800~0~0~0~~~2;CREATEDON~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~3;CREATEDONUTC~date~1~yyyy-MM-dd HH:mm:ss.SSS~26~0~0~0~~~4"
pattern = re.compile (";(.+?)~")
matches = re.findall ( pattern, variable )
print matches
Raffi
  • 1,810
  • 2
  • 16
  • 25