RegExp: Split string alphanumeric / numeric

Question

can you help me to split a string (or preferred if possible, place a blank at every "split" occasion) after every change from alphanumeric to numeric and vice versa ?

So a string like D2c1 22 should look like D 2 c 1 22. Best way from would be to put a blank at every change from alpha-numeric to numeric.

Somewhat related is this question on natural sorting of strings containing mixes of number and non-number sequences. Some of the queries there might prove useful to you. http://stackoverflow.com/questions/12965463/humanized-or-natural-number-sorting-of-mixed-word-and-number-strings — Craig Ringer, Nov 09 '12 at 06:28

davidrac · Answer 1 · 2012-11-09T06:30:57.823

3

You can use this regexp to find the places where it switches:

(?<=\d)(?=\D)|(?<=\D)(?=\d)

This way:

"234kjh23ljkgh34klj2345klj".gsub(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/, " ")
=> "234 kjh 23 ljkgh 34 klj 2345 klj"

Edit: Without zero length look ahead and look behind:

"234kjh23ljkgh34klj2345klj".gsub(/(\d)(\D)/, "#{$1} #{$2}").gsub(/(\D)(\d)/, "#{$2} #{$1}")
=> "23 jk 5 jkgk 5 lk 534 lj"

edited Nov 09 '12 at 06:30

answered Nov 09 '12 at 06:14

davidrac

10,723
3
39
71

Pg (9.2 at least) doesn't seem to understand that regexp, unfortunately. `regress=> select regexp_replace( '234kjh23ljkgh34klj2345klj', '(?<=\d)(?=\D)|(?<=\D)(?=\d)', ' ', 'g');` `ERROR: invalid regular expression: quantifier operand invalid` . See http://sqlfiddle.com/#!12/d41d8/148 – Craig Ringer Nov 09 '12 at 06:24
It probably can't handle zero-length lookahead and lookbehind. in such case you can use something like this approach (This is ruby syntax, so you'll have to adjust):"234kjh23ljkgh34klj2345klj".gsub(/(\d)(\D)/, "#{$1} #{$2}").gsub(/(\D)(\d)/, "#{$2} #{$1}") => "23 jk 5 jkgk 5 lk 534 lj" – davidrac Nov 09 '12 at 06:30
2

Syntax for lookahead in pg (from the doc): (?=re) positive lookahead matches at any point where a substring matching re begins (AREs only) (?!re) negative lookahead matches at any point where no substring matching re begins (AREs only) – davidrac Nov 09 '12 at 06:33
Thanks so much for that explanation, but can you translate me the above to fit inte pg? I am really not that in regular expressions – calimero Nov 09 '12 at 06:40

Craig Ringer · Accepted Answer · 2012-11-09T07:39:46.853

2

Here's an approach tested with PostgreSQL and verified to work. It's a bit tortured, so performance might be ... interesting.

CREATE AGGREGATE array_cat_agg (
  BASETYPE = anyarray,
  SFUNC = array_cat,
  STYPE = anyarray
);
SELECT array_to_string(array_cat_agg(a), ' ')
FROM regexp_matches('234kjh23ljkgh34klj2345klj', '(\D*)(\d*)', 'g') x(a);

We need array_cat_agg because regular array_agg can't aggregate arrays of arrays.

Alternately, a form of @davidrac's approach that'll work with PostgreSQL and probably perform significantly better (though I haven't tested) is:

SELECT regexp_replace(
  regexp_replace(
     '234kjh23ljkgh34klj2345klj', '(\d)(\D)', '\1 \2', 'g'
  ), '(\D)(\d)', '\1 \2', 'g');

This is executing the replacement in two passes. First it's inserting a space where series of digits end and series of non-digits begin. Then in another pass it's inserting spaces where series of non-digits end and series of digits begin.

Update: Here's an improved formulation:

SELECT trim(regexp_replace('234kjh23ljkgh34klj2345klj', '(?!\d)(\D+)|(?!\D)(\d+)', '\1\2 ', 'g'));

edited Nov 09 '12 at 07:39

answered Nov 09 '12 at 06:35

Craig Ringer

307,061
76
688
778

1

@davidrac Verified that your 2nd formulation works with Pg. Rephrased for Pg above, along with an earlier (uglier) implementation. +1'd your answer. – Craig Ringer Nov 09 '12 at 06:39
Thanks. But i gut that exact same teststring (234kjh23ljkgh34klj2345klj) as result when i try the above sql with the double regexp_replace – calimero Nov 09 '12 at 06:46
1

@calimero Could you be using an old version of PostgreSQL (9.0 or older) from before the `standard_conforming_strings` change? Try `SET standard_conforming_strings = on;` then test again. (This is why you **always mention your PostgreSQL version in questions**). If it works with `standard_conforming_strings` on, then (a) upgrade PostgreSQL and (b) see http://www.postgresql.org/docs/current/static/runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS for how to make the query work without upgrading PostgreSQL. Basically, instead of `'\'` use `E'\\'`; double backslashes and use `E''`. – Craig Ringer Nov 09 '12 at 06:48
Its psql 8.1. But i am not allowed to change that parameter (ERROR: parameter "standard_conforming_strings" cannot be changed). Guess i have to wait till the admin is here – calimero Nov 09 '12 at 06:52
1

@calimero No, just rewrite the query to work with your (frankly prehistoric) version of PostgreSQL, as per the docs link above. `'(\d)(\D)'` becomes `E'(\\d)(\\D)` and so on. You need to **urgently** start planning to upgrade your end-of-life and obsolete PostgreSQL by the way, and **always mention your version in questions** especially since it's so incredibly obsolete. See also http://www.postgresql.org/support/versioning/ . – Craig Ringer Nov 09 '12 at 06:53
It works like this now: SELECT regexp_replace( regexp_replace( '234kjh23ljkgh34klj2345klj', E'(\\d)(\\D)', E'\\1 \\2', 'g' ), E'(\\D)(\\d)', E'\\1 \\2', 'g'); – calimero Nov 09 '12 at 07:02
Thanks very much for your help guys!!! Yes, we already have a running PSQL 9.2 Instance and working up a new schema for it. But this Query has to be done quickly in the old system, so again, thanks for the great help! :) – calimero Nov 09 '12 at 07:03

score 1 · Answer 3 · answered Nov 09 '12 at 06:15

1

Best way from would be to put a blank at every change from alpha-numeric to numeric.

Its not hard to do:

$ echo "D2c1 22" | sed 's|\([a-ZA-Z]\)\([0-9]\)|\1 \2|g;s|\([0-9]\)\([a-ZA-Z]\)|\1 \2|g'
D 2 c 1 22

Here I used sed and its regexp because you doesn't mention which language you use. Main idea is to use 2 regexp which replaces alpha with digit and digit with alpha to first character, space and second character.

answered Nov 09 '12 at 06:15

Slava Semushin

14,904
7
53
69

Thanks very much for the quick resonses. I'd like to do it in postgresql. – calimero Nov 09 '12 at 06:19

score 1 · Answer 4 · answered Nov 09 '12 at 06:17

1

You can match using the regex

(?<=[a-z])(?=[0-9])|(?<=[0-9])(?=[a-z])

and replace it with a space.

See it in Perl

answered Nov 09 '12 at 06:17

codaddict

445,704
82
492
529

As @davidrac's solution, it seems Pg's regular expression engine doesn't cope with that one. See http://sqlfiddle.com/#!12/d41d8/148 – Craig Ringer Nov 09 '12 at 06:26
Generally better to use `\d` and `\D` too, so you can cope with any digit and non-digit sequences, not just lower-case alphanumeric. – Craig Ringer Nov 09 '12 at 07:52

RegExp: Split string alphanumeric / numeric

4 Answers4