1

I am trying to write a regular expression to search for anything but digits or the * or - characters, with one caveat. Where I'm hitting a wall is that I need to be able to allow three or less digits to be found but not four or more, though even one * or - shouldn't be found.

This is what I have so far (for three matches):

.*?([^0-9\*-]+).*?([^0-9\*-]+).*?([^0-9\*-]+).*?

I have no idea where to insert {4,} for the digits (I've tried and it doesn't seem to work anywhere) or how to change it to do as I want.

For instance, in "Jack has* 777 1883874 -sheep-" I'd like it to return "Jack has 777 sheep". Or in "2343klj-3***.net" I'd like it to return "klj 3 .net"

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
logan7
  • 37
  • 5
  • What programming language are you using? – Gurmanjot Singh Feb 19 '18 at 10:54
  • 2
    This is not clear. Try `[-*]+|\d{4,}` in a regex replacement method. – Wiktor Stribiżew Feb 19 '18 at 10:58
  • @Gurman I'm using ICU or PCRE. – logan7 Feb 19 '18 at 11:03
  • @WiktorStribiżew I'm new to this, just started this month, not sure what a replacement method is but I'll look into it. The one you gave me as is is returning the - and * too which I don't want. – logan7 Feb 19 '18 at 11:09
  • 1
    That is because you are *matching*, and it is not possible to match discontinuous texts within one matching operation. Remove matches found. What is the *programming language*? – Wiktor Stribiżew Feb 19 '18 at 11:13
  • I'm using Keyboard Maestro which uses ICU regular expressions. I can't find what programming language KM uses in particular. – logan7 Feb 19 '18 at 11:18
  • @WiktorStribiżew I got your suggestion to work, thanks. One problem, if I replace with a space then it replaces sometimes with multiple spaces in a row; I only want one space between each remaining word group. – logan7 Feb 19 '18 at 11:26
  • So, do you mean [`(?:[-*]|\d{4,})+`](https://regex101.com/r/rssGpZ/1) works as expected? – Wiktor Stribiżew Feb 19 '18 at 11:27
  • Here is what I just came up with off your first one and it seems to work: (-|\*|(\d{4,}))+ – logan7 Feb 19 '18 at 11:31
  • So, [`[-*]|\d{4,}`](https://regex101.com/r/rssGpZ/2) is working for you? – Wiktor Stribiżew Feb 19 '18 at 11:33
  • I'm not understanding what you're saying. The difference seems to be that one leaves too many spaces and the other doesn't. – logan7 Feb 19 '18 at 11:35
  • @WiktorStribiżew Just checked your two options through their links. The first leaves either too many or too little spaces as what I want is exactly one space between the word groups left. I'm thinking I'll have to add spaces to the group to replace then replace each with found grouping with a space. Maybe: `([-*\s]|\d{4,})+` and replace with space? – logan7 Feb 19 '18 at 11:41
  • 1
    You should explain what spaces you want to keep, I cannot help you here, because only you know the requirements. As for now, it is not possible to handle all spaces the way you want because your examples are inconsistent. If you use `([-*\h]|\d{4,})+` and replace with space, you will get `klj 3 .net` (with space at the start) – Wiktor Stribiżew Feb 19 '18 at 11:43
  • @WiktorStribiżew That one mostly works, thanks! I'm not sure why the one I wrote with \s didn't work at all while the \h did. Also, I would rather not have the space at the start or end but one space in between each word grouping. – logan7 Feb 19 '18 at 11:53
  • Can't you `trim()` the result after `regex.replace`? Also, try [`(\h)*(?:\h*[-*]|\h*\d{4,})+` to replace with `$1`](https://regex101.com/r/rssGpZ/3). – Wiktor Stribiżew Feb 19 '18 at 11:55
  • Neither of those seemed to work for me. I'm not sure if the trim one is supported in ICU. The second one actually creates more space at first in the example I'm trying (`**duf87867-d777.com**`). I did a quick google search for replacing white spaces at beginning and end and found this: `^\s*(.*)\s*$` Seemed to work. It looks like your last one should do it all in one and I'd prefer that but even though this is two steps at least it's doing what I need. – logan7 Feb 19 '18 at 12:14

2 Answers2

1

You may use the following regex (replacing with a literal space, " "):

(?:[-*\s]|\d{4,})+

See the regex demo. Replace with $1 (to insert one captured horizontal whitespace if any).

Details

  • (?:[-*\s]|\d{4,})+ - a non-capturing group matching one or more consecutive repetitions of
    • [-*\s] - 0+ whitespaces, - or/and *
    • | - or
    • \d{4,} - 4+ digits.

Next, to remove all leading and trailing whitespace you may use

^\s+|\s+$

and replace with an empty string. ^\s+ matches 1+ whitespaces at the start of the string and \s+$ matches 1+ whitespaces at the end of the string.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I'm not sure why but this isn't working exactly as I'm wanting. I see by trying it in the regex demo that `**duf87867*d777.com**` takes off white spaces in front and end but when I'm using it on my computer it's still leaving the white spaces on each end. Also, I'm not sure why the ones that look like website addresses are taking out all white spaces while the one that looks like a sentence is keeping them in between; I need even the ones that look like website addresses to keep the white spaces in between each leftover word grouping. – logan7 Feb 19 '18 at 12:26
  • Upvoted for the sheer amount of effort put into this for no reward. – Chris Lear Feb 19 '18 at 13:01
0

With the help here, this is what works. It may be impossible to do it all in one regex because of the conflict of needing no spaces at the beginning and end but spaces in between each remaining grouping.

First, a find and replace using ([-*\h]|\d{4,})+ and replacing with a space.

Second, using ^\s*(.*)\s*$.

logan7
  • 37
  • 5
  • `^\s*(.*)\s*$` won't work, use `^\s+|\s+$` to replace with an empty string. And please upvote/accept the answer that proved helpful to you. – Wiktor Stribiżew Feb 19 '18 at 23:30
  • @WiktorStribiżew I will try that too but the one I posted in my answer does work for me. Do you mean accept your answer? Sorry, I'm new here and don't know how to 'accept' it, I don't see any button or anything for that? As for the upvote, I really appreciate all your help so far but I didn't upvote your official answer yet because it's not working for what I asked. (cont.) – logan7 Feb 20 '18 at 00:25
  • @WiktorStribiżew (cont.) I do see you've edited it to take off the starting and ending spaces, but it still leaves no spaces between groupings with no spaces already present. Please refer to my initial question and the examples I gave there with how I need them to end up, specifically `2343klj-3***.net` needs to become `klj 3 .net` which it is not with your answer so far unfortunately. – logan7 Feb 20 '18 at 00:25
  • @WiktorStribiżew I figured out the difference between your start-end whitespace suggestion and mine. Yours is for using a find and replace the second time as well. Mine is for just using a find the second time since it wouldn't necessarily need a find and replace for that part. – logan7 Feb 20 '18 at 06:41
  • I will read that. I'm sorry but you are incorrect on my original question. Please re-read it. In my original question I very clearly wanted to obtain `klj 3 .net` with the spaces in between, and I reiterated that a few times in our discussion, and the regex options I kept mentioning being interested in supplied those spaces so I'm not sure how I could've been clearer. I feel like a lot of this back and forth could've been avoided if we'd been on the same page there. I did upvote your comments that were helpful and if you edit your answer to one that does what I asked I'll upvote it. (cont.) – logan7 Feb 20 '18 at 08:33
  • I don't mean to be rude and I want to be polite, but I also feel you've been somewhat stubborn (though helpful) in this, especially considering that I'm very very new to anything to do with coding or programming or regex (believe it or not I just started learning anything to do with this last week; I was an absolute novice before and hadn't even heard of "regex"). You *have* really helped me learn the correct answer in the comments through our discussion and your suggestions and me fiddling on top of that and I really appreciate that. – logan7 Feb 20 '18 at 08:40