0

I know how to use a dictionary as a switcher in Python. I'm not sure how to use one for my specific case. I think I will just need to use if, elif, and else but hopefully I am proved wrong by the community :)

I want to make a find/replace function for certain characters in strings. The string is at least one sentence but usually more and comprised of many words.

Basically what I am doing is the following:

if non-breaking hyphen in string:  # string is a sentence with many words
  replace non-breaking hyphen with dash

elif en dash in string:
  replace en dash with dash

elif em dash in string:
  replace em dash with dash

elif non-breaking space in string:
  replace non-breaking space with space

.... and so forth

The only thing I can think of is splitting the string apart into separate sub-strings and then looping through them then the dictionary switcher would work. But this would obviously add a lot of extra processing time and the purpose of using a dictionary switcher is to save time.

I could not find anything on this specific topic searching everywhere.

Is there a way to use a switcher in Python using if in and elif in?

probat
  • 1,422
  • 3
  • 17
  • 33
  • Looks like a good fit for regular expressions to me. One regex for each `if` clause, and you won't need any conditional logic. – cxw Jun 20 '18 at 17:45
  • 1
    @cxw If you need one regex per `if` clause, what's the point of using regex instead of string operations? – Aran-Fey Jun 20 '18 at 17:45
  • 2
    You want just `if` statements not `elif`. If there's a `en dash` in `string`, that doesn't mean you want to ignore the `em dash` in string too, does it? If each thing you want to replace is a single character, you should use [`str.translate`](https://docs.python.org/3/library/stdtypes.html#str.translate) – Patrick Haugh Jun 20 '18 at 17:46
  • 4
    But is it truly necessary to have `if`s here in the first place? Why not just perform the replacements without checking whether the character is present first? `if "." in s: s = s.replace(".", "-")` has the same behavior as `s = s.replace(".", "-")` by itself. – Kevin Jun 20 '18 at 17:46
  • @Kevin thanks, I will change that since what I'm doing is redundant. For the sake of this question lets assume I am not replacing text and doing some other operation based on if something is in a string, is it possible to create a dictionary switcher? – probat Jun 20 '18 at 17:49
  • Why isn't anyone recommending str.translate? – cs95 Jun 20 '18 at 17:50
  • @coldspeed: Patrick Haugh did. – user2357112 Jun 20 '18 at 17:56
  • Suggestion for the OP: Change all the `replace X with Y` things to `print(Y)` to make it more obvious that you're asking for a way to rewrite those `if...elif` statements in a more DRY fashion. – Aran-Fey Jun 20 '18 at 17:58
  • I think by giving string replacement as an example use of what you want to do you've led most of the answerers astray, since there are several string-replacement specific solutions to this problem. The more general problem is less amenable to a tidy and efficient solution (especially if you really do want `elif`s, which usually wouldn't make sense in the text replacement situations; you'd more often just want successive `if`s). – Blckknght Jun 20 '18 at 18:31

3 Answers3

4

Here's the str.translate solution

replacements = {
    '\u2011': '-',  # non breaking hyphen
    '\u2013': '-',  # en dash
    '\u2014': '-',  # em dash
    '\u00A0': ' ',  # nbsp
}

trans = str.maketrans(replacements)
new_string = your_string.translate(trans)

Note that this only works if you want to replace single characters from the input. {'a': 'bb'} is a valid replacements, but {'bb': 'a'} is not.

Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
  • 1
    The issue I see here is that you will replace all at once, contrary to pseudo code implementation of OP – Benjamin Toueg Jun 20 '18 at 18:07
  • 1
    @BenjaminToueg I'm operating under the impression that this is what they actually wanted, even if it wasn't what they wrote. – Patrick Haugh Jun 20 '18 at 18:09
  • @Patrick Haugh sorry I wasn't as clear as you would have liked. It is difficult to explain in text. This is what I was after, thanks. – probat Jun 21 '18 at 13:06
2

Just to show that regex is a valid solution, and some timings:

replacements = {
    '\u2011': '-',
    '\u2013': '-',
    '\u2014': '-',
    '\u00A0': ' ', 
}

import re
s = "1‑‑‑‑2–––––––3————————"

re.sub(
    '|'.join(re.escape(x) for x in replacements),
    lambda x: replacements[x.group()], s
)
# Result
1----2-------3--------

Timings (str.trans wins and is also cleaner)

s = "1‑‑‑‑2–––––––3————————"
s *= 10000

%timeit re.sub('|'.join(re.escape(x) for x in replacements), lambda x: replacements[x.group()], s)
90.7 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [733]: %timeit s.translate(trans)
15.8 ms ± 59.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • 1
    The main advantage to this approach is that it works with multiple character substrings that you may want to replace. I'd suggest using `'|'.join(re.escape(x) for x in replacements)` to avoid any issues if there are regex-relevant characters to be replaced (and there's no need to call `keys()`). I'm not sure the `'({})'.format(...)` part is necessary, you can just use `x|y` from the `join` call as the pattern directly, as the replacement works with no capturing group at all. – Blckknght Jun 20 '18 at 18:23
  • Yea, was working on cleaning the regex up, thanks for the suggestions, I'll update – user3483203 Jun 20 '18 at 18:24
1

Although Benjamin's answer might be right, it is case-specific, while your question has a rather general-purpose tone to it. There is a universal functional approach (I've added Python 3.5 type annotations to make this code self-explanatory):

from typing import TypeVar, Callable, Iterable

A = TypeVar('A')
B = TypeVar('B')
Predicate = Callable[[A], bool]
Action = Callable[[A], B]
Switch = Tuple[Predicate, Action]

def switch(switches: Iterable[Switch], default: B, x: A) -> B:
    return next(
        (act(x) for pred, act in switches if pred(x)), default
    )

switches = [
    (lambda x: '\u2011' in x, lambda x: x.replace('\u2011', '-')),
    (lambda x: '\u2013' in x, lambda x: x.replace('\u2013', '-'))
]
a = "I'm–a–string–with–en–dashes"

switch(switches, a, a) # if no switches are matched, return the input

This is quite superfluous in your case, because your example boils down to a regex operation. Take note, while switches can be any iterable, you might want to use something with predictable iteration order, i.e. any Sequence type (e.g. list or tuple), because the first action with a matched predicate will be used.

Eli Korvigo
  • 10,265
  • 6
  • 47
  • 73
  • Good solution, but unreadable. It would help a lot if you added a usage example so people can figure out how to use it. – Aran-Fey Jun 20 '18 at 18:25
  • @Aran-Fey I've hesitated to add an example, because the solution is quite superfluous in the context of OP's example. Nevertheless, here you are. – Eli Korvigo Jun 20 '18 at 18:50
  • Yeah, it's pretty annoying to have to write a bunch of lambdas. It's probably more useful as recipe than a function. It'd look much cleaner with a loop with the lambdas inlined. – Aran-Fey Jun 20 '18 at 19:11
  • @Aran-Fey in general, the point of using a function is the ability to apply it partially to a list of switches (e.g. `replace_dashes = partial(switch, switches)` or even `partial(switch, switches, default)`, if the default is fixed, too) and reuse the switcher multiple times in various places/expressions allowing for expressive (in the functional programming sense) code. – Eli Korvigo Jun 20 '18 at 19:36