2

I have a series of strings in my table that I want to order. eg.

Chapter 1 
Chapter 2 
Chapter 11 
Chapter 12

To order them correctly I would need to pad them with zeros.

Chapter 001 
Chapter 002 
Chapter 011 
Chapter 012

Maybe I can use regexp_replace() pad in zeros with the use of a regular expression.

  regexp_replace(chapters.name,'(\d+)\D*','0\1') as name

This solution does not work. I thought maybe I could figure out the length of the backreference and use that number to pad what I need:

  regexp_replace(chapters.name,'(\d+)\D*',lpad('\l',3-length('\1'),'0') as name

but a \1 in length() always returns 2, even though I noticed using \1 in certain postgres function translate \1 to the actual captured data.

How can I pad zeros using Postgres string functions and Postgres regex?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Andrew WC Brown
  • 2,240
  • 5
  • 22
  • 24

1 Answers1

6

Problems

Your second attempt is victim to two misunderstandings:

  • The second parameter of lpad() and rpad() is the resulting total length, not the number of characters to add.

  • You are confusing the scope of regexp_replace() with the scope of lpad(). lpad() is executed first and \l and \1 have no special meaning to lpad().

Solutions

It's a problem of "natural sort". Related answer:

As advised over there, the best solution would be to store normalized data to begin with. If the leading 'Chapter ' is actually an immutable string in all values, don't store that at all and just store the number part as integer.

If your example values are accurate, we can ignore the leading constant text and just sort by the trailing number. Much quicker than any attempt at reformatting the strings:

SELECT *
FROM   chapters
ORDER  BY split_part(name, ' ', 2)::int

If you really need the zero-padded strings, and the leading text can vary:

SELECT split_part(name, ' ', 1) || ' '
    || lpad(split_part(name, ' ', 2), 3,  '0')
FROM   chapters
ORDER  BY 1;

SQL Fiddle.

Two function calls plus concatenation, but still faster than regexp_replace(). Regular expressions are comparatively expensive.
More examples for lpad():

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This set me on the right path. The data I had wasn't very clean. Some rows had multiple spaces. It also had 3 words. I could have used multiple split_parts, or substring or something else. Anyway I used: `select mapname, regexp_replace(mapname,'[ ]+[0-9].*',' ') || lpad(regexp_replace(mapname,'(.*)[ ]+([0-9].*)','\2'), 3, '0') from mapextents where maptype ~ 'grids'` for this `"Map Index 94";"Map Index 094" "Map Index 95";"Map Index 095" "Map Index 100";"Map Index 100"` – Bryan Jun 23 '17 at 17:02