0

I have the following string:

str = "AAbbbCddEE"

I also know the ranges of letters that should be excluded from the string, here 2:5 and 6:8. The expected result for this example is the String AACEE.

There might be also more than two ranges (or a single one) and ranges can also overlap each other. Let's say from the same string the ranges 2:5, 6:8 and 4:9 should be excluded, I expect the result AAE. How can I perform this task in Python?

Hor Net
  • 11
  • 3
  • Do you have any code you've tried? – Eli Korvigo Jul 06 '18 at 16:17
  • 1
    Don't name strings `str`, as it overrides a builtin name – user3483203 Jul 06 '18 at 16:17
  • Are you trying to remove lowecase letters, or is this circumstantial to this example? You can pool all the ranges in one set (to ignore duplicates) and then filter any indices based on that – Reti43 Jul 06 '18 at 16:18
  • 1
    I don't quite think that dup fits. This question is asking about how to exclude a number of ranges. The dup shows how to join a number of ranges. – user3483203 Jul 06 '18 at 16:37
  • @user3483203 I agree. This might not be the best question (guidelines-wise), but it is most certainly not a duplicate of that question. – Eli Korvigo Jul 06 '18 at 16:39

2 Answers2

4

Option 1
any with enumerate

Keep your ranges in a list, then use any and enumerate to check if an index is contained in any of the ranges:

>>> s = "AAbbbCddEE"
>>> ranges = [range(2,5), range(6,8), range(4,9)]
>>> ''.join([letter for idx, letter in enumerate(s) if not any(idx in rng for rng in ranges)])
'AAE'

Option 2
Use set difference to identify indices to keep...

r = set(range(len(s)))
for rng in ranges:
    r -= set(rng)
# {0, 1, 9}

...Then join with a list comprehension

>>> ''.join([letter for idx, letter in enumerate(s) if idx in r])
'AAE'

I would highly recommend the second approach. The overhead from calculating the initial set is still much more desirable than having to possibly check every range for each element:

# Initial List

s = "AAbbbCddEE"
ranges = [range(2,5), range(6,8), range(4,9)]

%timeit ''.join([letter for idx, letter in enumerate(s) if not any(idx in rng for rng in ranges)])
8.38 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
r = set(range(len(s)))
for rng in ranges:
    r -= set(rng)
''.join([letter for idx, letter in enumerate(s) if idx in r])

3.35 µs ± 59.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# Much larger list

len(s)
100000

len(ranges)
300

%timeit ''.join([letter for idx, letter in enumerate(s) if not any(idx in rng for rng in ranges)])
3.25 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
r = set(range(len(s)))
for rng in ranges:
    r -= set(rng)
''.join([letter for idx, letter in enumerate(s) if idx in r])

18.8 ms ± 90.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • The second approach is also much less efficient memory-wise. And the first approach can be improved by reducing the range set. – Eli Korvigo Jul 06 '18 at 16:49
  • Yes, I guess worst case for memory usage would be `O(2n)`, since worst case would be every single index in the initial list is included. – user3483203 Jul 06 '18 at 16:52
0

You could do newstring = oldstring.replace(oldstring[a:b], ""), where [a:b] is whatever range you want to exclude, and you could use oldstring instead of newstring if you didn't want a new variable.

One thing though, if you wanted to exclude 2:5, 6:8 and 4:9 from the same string, could you not just exclude 2:9, or am I missing something?

PythonParka
  • 134
  • 3
  • 13
  • 2
    This won't work if they have repeating elements, as it will replace all occurences of whatever is found in that slice. It also won't work after the first replacement since the ranges will have changed. – user3483203 Jul 06 '18 at 16:24
  • Good point, thanks for pointing that out. – PythonParka Jul 06 '18 at 16:25