3

can you help me to split a string (or preferred if possible, place a blank at every "split" occasion) after every change from alphanumeric to numeric and vice versa ?

So a string like D2c1 22 should look like D 2 c 1 22. Best way from would be to put a blank at every change from alpha-numeric to numeric.

Nishant
  • 54,584
  • 13
  • 112
  • 127
calimero
  • 35
  • 4
  • 1
    I can help you but can not do it for you. – Shiplu Mokaddim Nov 09 '12 at 06:11
  • 1
    Somewhat related is this question on natural sorting of strings containing mixes of number and non-number sequences. Some of the queries there might prove useful to you. http://stackoverflow.com/questions/12965463/humanized-or-natural-number-sorting-of-mixed-word-and-number-strings – Craig Ringer Nov 09 '12 at 06:28

4 Answers4

3

You can use this regexp to find the places where it switches:

(?<=\d)(?=\D)|(?<=\D)(?=\d)

This way:

"234kjh23ljkgh34klj2345klj".gsub(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/, " ")
=> "234 kjh 23 ljkgh 34 klj 2345 klj"

Edit: Without zero length look ahead and look behind:

"234kjh23ljkgh34klj2345klj".gsub(/(\d)(\D)/, "#{$1} #{$2}").gsub(/(\D)(\d)/, "#{$2} #{$1}")
=> "23 jk 5 jkgk 5 lk 534 lj"
davidrac
  • 10,723
  • 3
  • 39
  • 71
  • Pg (9.2 at least) doesn't seem to understand that regexp, unfortunately. `regress=> select regexp_replace( '234kjh23ljkgh34klj2345klj', '(?<=\d)(?=\D)|(?<=\D)(?=\d)', ' ', 'g');` `ERROR: invalid regular expression: quantifier operand invalid` . See http://sqlfiddle.com/#!12/d41d8/148 – Craig Ringer Nov 09 '12 at 06:24
  • It probably can't handle zero-length lookahead and lookbehind. in such case you can use something like this approach (This is ruby syntax, so you'll have to adjust):"234kjh23ljkgh34klj2345klj".gsub(/(\d)(\D)/, "#{$1} #{$2}").gsub(/(\D)(\d)/, "#{$2} #{$1}") => "23 jk 5 jkgk 5 lk 534 lj" – davidrac Nov 09 '12 at 06:30
  • 2
    Syntax for lookahead in pg (from the doc): (?=re) positive lookahead matches at any point where a substring matching re begins (AREs only) (?!re) negative lookahead matches at any point where no substring matching re begins (AREs only) – davidrac Nov 09 '12 at 06:33
  • Thanks so much for that explanation, but can you translate me the above to fit inte pg? I am really not that in regular expressions – calimero Nov 09 '12 at 06:40
2

Here's an approach tested with PostgreSQL and verified to work. It's a bit tortured, so performance might be ... interesting.

CREATE AGGREGATE array_cat_agg (
  BASETYPE = anyarray,
  SFUNC = array_cat,
  STYPE = anyarray
);
SELECT array_to_string(array_cat_agg(a), ' ')
FROM regexp_matches('234kjh23ljkgh34klj2345klj', '(\D*)(\d*)', 'g') x(a);

We need array_cat_agg because regular array_agg can't aggregate arrays of arrays.

Alternately, a form of @davidrac's approach that'll work with PostgreSQL and probably perform significantly better (though I haven't tested) is:

SELECT regexp_replace(
  regexp_replace(
     '234kjh23ljkgh34klj2345klj', '(\d)(\D)', '\1 \2', 'g'
  ), '(\D)(\d)', '\1 \2', 'g');

This is executing the replacement in two passes. First it's inserting a space where series of digits end and series of non-digits begin. Then in another pass it's inserting spaces where series of non-digits end and series of digits begin.

Update: Here's an improved formulation:

SELECT trim(regexp_replace('234kjh23ljkgh34klj2345klj', '(?!\d)(\D+)|(?!\D)(\d+)', '\1\2 ', 'g'));
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 1
    @davidrac Verified that your 2nd formulation works with Pg. Rephrased for Pg above, along with an earlier (uglier) implementation. +1'd your answer. – Craig Ringer Nov 09 '12 at 06:39
  • Thanks. But i gut that exact same teststring (234kjh23ljkgh34klj2345klj) as result when i try the above sql with the double regexp_replace – calimero Nov 09 '12 at 06:46
  • 1
    @calimero Could you be using an old version of PostgreSQL (9.0 or older) from before the `standard_conforming_strings` change? Try `SET standard_conforming_strings = on;` then test again. (This is why you **always mention your PostgreSQL version in questions**). If it works with `standard_conforming_strings` on, then (a) upgrade PostgreSQL and (b) see http://www.postgresql.org/docs/current/static/runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS for how to make the query work without upgrading PostgreSQL. Basically, instead of `'\'` use `E'\\'`; double backslashes and use `E''`. – Craig Ringer Nov 09 '12 at 06:48
  • Its psql 8.1. But i am not allowed to change that parameter (ERROR: parameter "standard_conforming_strings" cannot be changed). Guess i have to wait till the admin is here – calimero Nov 09 '12 at 06:52
  • 1
    @calimero No, just rewrite the query to work with your (frankly prehistoric) version of PostgreSQL, as per the docs link above. `'(\d)(\D)'` becomes `E'(\\d)(\\D)` and so on. You need to **urgently** start planning to upgrade your end-of-life and obsolete PostgreSQL by the way, and **always mention your version in questions** especially since it's so incredibly obsolete. See also http://www.postgresql.org/support/versioning/ . – Craig Ringer Nov 09 '12 at 06:53
  • It works like this now: SELECT regexp_replace( regexp_replace( '234kjh23ljkgh34klj2345klj', E'(\\d)(\\D)', E'\\1 \\2', 'g' ), E'(\\D)(\\d)', E'\\1 \\2', 'g'); – calimero Nov 09 '12 at 07:02
  • Thanks very much for your help guys!!! Yes, we already have a running PSQL 9.2 Instance and working up a new schema for it. But this Query has to be done quickly in the old system, so again, thanks for the great help! :) – calimero Nov 09 '12 at 07:03
1

Best way from would be to put a blank at every change from alpha-numeric to numeric.

Its not hard to do:

$ echo "D2c1 22" | sed 's|\([a-ZA-Z]\)\([0-9]\)|\1 \2|g;s|\([0-9]\)\([a-ZA-Z]\)|\1 \2|g'
D 2 c 1 22

Here I used sed and its regexp because you doesn't mention which language you use. Main idea is to use 2 regexp which replaces alpha with digit and digit with alpha to first character, space and second character.

Slava Semushin
  • 14,904
  • 7
  • 53
  • 69
1

You can match using the regex

(?<=[a-z])(?=[0-9])|(?<=[0-9])(?=[a-z])

and replace it with a space.

See it in Perl

codaddict
  • 445,704
  • 82
  • 492
  • 529
  • As @davidrac's solution, it seems Pg's regular expression engine doesn't cope with that one. See http://sqlfiddle.com/#!12/d41d8/148 – Craig Ringer Nov 09 '12 at 06:26
  • Generally better to use `\d` and `\D` too, so you can cope with any digit and non-digit sequences, not just lower-case alphanumeric. – Craig Ringer Nov 09 '12 at 07:52