1

I am using python to try to separate the information from strings of this type:

r = "(-0.04530261550379927+0j) [X0 X1 Y2 Y3]"

ultimately the information that I need is the number in the parenthesis and separate the letters from the numbers in the array. So in the example above, the results that I would like to get are: the number -0.04530261550379927, an array: [X, X, Y, Y] and another array: [0, 1, 2, 3].

I have been trying with re.match but since this is the first time that I use this module I find it very confusing.

Would appreciate any help.

reich98
  • 13
  • 2
  • 3
    You could share your best attempt at solving this with well chosen sample input and the corresponding output, this could be a good starting point to help you understand what improvements your code needs. Also, some points are unclear: 1/ the sample data contains a complex number, your expected output is a float. What about the imaginary part? 2/ In the second part, would it always be exactly one letter followed by exactly one digit, and this exactly four times? Please clarify. – Thierry Lathuille Apr 10 '21 at 11:42
  • @ThierryLathuille For all the data that I have the imaginary part is always 0 so it is no problem. In the second part, sometimes the array might be empty, or otherwise it will always be a letter followed by exactly one number, not necessarily exactly four times, could be one, two or three as well. – reich98 Apr 10 '21 at 12:47

1 Answers1

1

You can do like this:

import re

r = "(-0.04530261550379927+0j) [X0 X1 Y2 Y3]"
match = re.match(r"\(([-+]?\d+(?:\.\d+)?)\+\d+j\) \[((?:[XYZ]\d(?: [XYZ]\d)*)?)]", r)
number, array = match.groups()

number = float(number)
a1, a2 = [], []
for i in array.split():
    a1.append(i[0])
    a2.append(int(i[1]))

print(number, a1, a2)

Explanation:

Regex pattern r"\(([-+]?\d+(?:\.\d+)?)\+\d+j\) \[((?:[XYZ]\d(?: [XYZ]\d)*)?)]" matches the given string:

  • part ([-+]?\d+(?:\.\d+)?) matches number
  • part ((?:[XYZ]\d(?: [XYZ]\d)*)?) matches array
  • there are non-capturing groups defined like (?:<match>)

match.groups() returns a list of all captured groups (2 in our case), and we unpack the list to variables number, array

Next, we split our string stored in array by space and iterate through items:

  • first character is appended to a1
  • second character is converted to int and appended to a2

Output:

-0.04530261550379927 ['X', 'X', 'Y', 'Y'] [0, 1, 2, 3]
GooDeeJAY
  • 1,681
  • 2
  • 20
  • 27
  • It works well, only that I should have explained better my question: I could also have arrays of the form [] or [X0] for instance, in which case this doesn't seem to work well. – reich98 Apr 10 '21 at 13:01
  • @reich98 Got it, updated the answer with a new regex pattern that matches dynamic arrays! – GooDeeJAY Apr 10 '21 at 13:47
  • 1
    +1 Note that the digit `[\d]+` does not have to be between square brackets. As a small suggestion, if you repeat the last part between the square bracket with a leading space in the group you don't need the question mark and then there can be no trailing space. `\(([-+]?\d+(?:\.\d+)?)\+\d+j\) \[([XYZ]\d(?: [XYZ]\d)*)]` See https://regex101.com/r/HbDlN7/1 – The fourth bird Apr 13 '21 at 19:11
  • 1
    @Thefourthbird, thanks for correcting! I will update the answer. But one problem with your suggested regex is that it doesn't match empty arrays, so I will skip that part – GooDeeJAY Apr 13 '21 at 19:59
  • 1
    I am sorry I missed that part. In that case you can make the whole repeating part optional. https://regex101.com/r/M7GNuZ/1 – The fourth bird Apr 13 '21 at 20:11
  • 1
    Yep, I did so and updated the answer, thank you) – GooDeeJAY Apr 13 '21 at 20:13
  • ([-+]?\d+(?:\.\d+)?) will you describe what is happening. What is the "?" symbol doing? Are you excluding +0j – Golden Lion Jun 04 '21 at 20:14
  • 1
    @GoldenLion `[-+]?` matches -+ or nothing, `\d+` matches number sequence, `(?:)` is non-capturing group, so `(?:\.\d+)?` matches a floating point part of the number if exists without capturing the group, doing so we will not receive that unneeded group when we call `match.groups()`. Hope that helps, don't forget to upvote if the answer was helpful :) – GooDeeJAY Jun 05 '21 at 06:11
  • What is a none capturing group – Golden Lion Jun 05 '21 at 11:31
  • 1
    @GoldenLion read this SO post: [What is a non-capturing group in regular expressions?](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-in-regular-expressions). Hope that helps) – GooDeeJAY Jun 05 '21 at 19:27
  • I am starting to use non-capturing groups. they are very helpful – Golden Lion Jun 27 '21 at 10:11