1

I want to split the following string

string = "events.error > 5 AND timeseries.temperature < 20 OR variants.battery = 'Li' AND vehicle.name = 'Audi'"

at the AND and OR position, so that I can return the following list

list = ["events.error > 5", "AND", "timeseries.temperature < 20", "OR", "variants.battery = 'Li'", "AND", "vehicle.name = 'Audi'"]

I tried string.split() but the groups are also split by the spaces. Do you know a better approach ?

Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
deptrai
  • 147
  • 1
  • 9

4 Answers4

1

You could split() the string on "AND", then intersperse "AND between the items in the list. It also seems you don't want any trailing whitespace, so you need to strip() that as well.

string = "events.error > 5 AND timeseries.temperature < 20 AND variants.battery = 'Li' AND vehicle.name = 'Audi'"

def intersperse(lst, item):
    result = [item] * (len(lst) * 2 - 1)
    result[0::2] = lst
    return result

result = intersperse([x.strip() for x in string.split("AND")], "AND")

print(result)

Output:

['events.error > 5', 'AND', 'timeseries.temperature < 20', 'AND', "variants.battery = 'Li'", 'AND', "vehicle.name = 'Audi'"]
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
1

The easiest way is to use re.split, putting the separator in a group will make it be kept in the output:

import re

string = "events.error > 5 AND timeseries.temperature < 20 AND variants.battery = 'Li' AND vehicle.name = 'Audi'"

re.split(r'(AND)', string)

Output:

['events.error > 5 ',
 'AND',
 ' timeseries.temperature < 20 ',
 'AND',
 " variants.battery = 'Li' ",
 'AND',
 " vehicle.name = 'Audi'"]

Edit: as you included a second separator 'OR', the updated version:

import re

string = "events.error > 5 AND timeseries.temperature < 20 OR variants.battery = 'Li' AND vehicle.name = 'Audi'"

re.split(r'(AND|OR)', string)

Output:

['events.error > 5 ',
 'AND',
 ' timeseries.temperature < 20 ',
 'OR',
 " variants.battery = 'Li' ",
 'AND',
 " vehicle.name = 'Audi'"]

If you want to get rid of the spaces around your strings, you can include any number of leading and trailing spaces in the separator, but outside of the group, so that they don't appear in the output:

import re

string = "events.error > 5 AND timeseries.temperature < 20 OR variants.battery = 'Li' AND vehicle.name = 'Audi'"

re.split(r'\s*(AND|OR)\s*', string)  # include the spaces in the separator, 
                                     # but don't keep them 

Output:

['events.error > 5',
 'AND',
 'timeseries.temperature < 20',
 'OR',
 "variants.battery = 'Li'",
 'AND',
 "vehicle.name = 'Audi'"]
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
0

You can split by a string with string.split("AND"), and then .strip() to lost the leading/trailing whitespaces:

my_string = "events.error > 5 AND timeseries.temperature < 20 AND variants.battery = 'Li' AND vehicle.name = 'Audi'" 
my_list = my_string.split('AND') 
my_list = [x.strip() for x in my_list]
print(my_list)

returns

['events.error > 5', 'timeseries.temperature < 20', "variants.battery = 'Li'", "vehicle.name = 'Audi'"]

Note that without the .strip(), you keep needless whitespaces, which you probably don't want.

Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
0

You could combine split() with itertools.cycle() and zip():

import itertools


string = "events.error > 5 AND timeseries.temperature < 20 AND variants.battery = 'Li' AND vehicle.name = 'Audi'"
print([x for y in zip(string.split(' AND '), itertools.cycle(['AND'])) for x in y][:-1])
# ['events.error > 5', 'AND', 'timeseries.temperature < 20', 'AND', "variants.battery = 'Li'", 'AND', "vehicle.name = 'Audi'"]
norok2
  • 25,683
  • 4
  • 73
  • 99