Factor (string) to Numeric in PostgreSQL

Question

Similar to this, is it possible to convert a String field to Numeric in PostgreSQL. For instance,

create table test (name text);

insert into test (name) values ('amy');
insert into test (name) values ('bob');
insert into test (name) values ('bob');
insert into test (name) values ('celia');

and add a field that is

 name  | num
-------+-----
 amy   | 1
 bob   | 2
 bob   | 2
 celia | 3

What does the number represent? Are you looking for the string's ASCII character representation? Or do you want to assign a character an arbitrary number like a = 1? — jcern, Oct 03 '12 at 16:04
@jcern there is no meaning of the numbers as long as they can distinguish the names. — Rock, Oct 03 '12 at 16:05
@MattBall I need an automatic procedure to convert the name field to numeric. The real table has thousands of rows. — Rock, Oct 03 '12 at 16:07
Your example seems to indicate duplicate strings? You want to enter complete dupes into the table? This is generally a bad idea ... — Erwin Brandstetter, Oct 03 '12 at 16:22
@ErwinBrandstetter Actually there are other fields such as IDs. To simplify the problem, I've only shown the `name`. — Rock, Oct 03 '12 at 16:23
You may have over-simplified the example. It is essential to the problem whether strings and / or rows are unique or not. I provided an answer for the question as is - dealing with duplicates. — Erwin Brandstetter, Oct 03 '12 at 16:50
Your example appears to be well suited to the `dense_rank` window function, since you want the same field to be the same number. — Craig Ringer, Oct 04 '12 at 03:11
@Rock See answer. I think it's a bad idea; this feels like one of those cases where you're asking how to accomplish a solution you've decided on, when you should really be asking about the underlying problem (why you want to add these numbers to each row). A hash or proper generated key via a lookup table is almost certainly wiser. Nonetheless, it'll do what you want. — Craig Ringer, Oct 04 '12 at 03:22
@CraigRinger It is really I need some unique identifier for each row. So it can either be a number or hash. Thank you for opening the other solution. — Rock, Oct 04 '12 at 20:46

score 4 · Answer 1 · edited Jun 20 '20 at 09:12

The most effective "hash"-function of all is a serial primary key - giving you a unique number like you wished for in the question.

I also deal with duplicates in this demo:

CREATE TEMP TABLE string (
  string_id serial PRIMARY KEY
 ,string    text NOT NULL UNIQUE    -- no dupes
 ,ct        int NOT NULL DEFAULT 1  -- count instead of dupe rows
);

Then you would enter new strings like this:
(Data-modifying CTE requires PostgreSQL 9.1 or later.)

WITH x AS (SELECT 'abc'::text AS nu)
   , y AS (
   UPDATE string s
   SET    ct = ct + 1
   FROM   x
   WHERE  s.string = x.nu
   RETURNING TRUE
   )
INSERT INTO string (string)
SELECT nu
FROM   x
WHERE NOT EXISTS (SELECT 1 FROM y);

If the string nu already exists, the count (ct) is increased by 1. If not, a new row is inserted, starting with a count of 1.

The UNIQUE also adds an index on the column string.string automatically, which leads to optimal performance for this query.

Add additional logic (triggers ?) for UPDATE / DELETE to make this bullet-proof - if needed.

Note, there is a super-tiny race condition here, if two concurrent transactions try to add the same string at the same moment in time. To be absolutely sure, you could use SERIALIZABLE transactions. More info and links under this this related question.

Live demo at sqlfiddle.

Thank for this great answer, always as complete as your other answers, Erwin. — Rock, Oct 03 '12 at 17:36

score 1 · Answer 2 · answered Oct 03 '12 at 16:15

1

If they are all single characters, you could do this:

ALTER TABLE test ADD COLUMN num int;
UPDATE test SET num = ascii(name);

Though that would only return the character for the first letter if the string was more than a single character.

answered Oct 03 '12 at 16:15

jcern

7,798
4
39
47

Thank you Jcern. Actually they are NOT single characters. It's a `text` field. – Rock Oct 03 '12 at 16:16

score 1 · Accepted Answer · edited May 23 '17 at 12:24

1

How 'bout a hash, such as md5, of name?

create table test (name text, hash text);

-- later

update test set hash = md5(name);

If you need to convert that md5 text to a number: Hashing a String to a Numeric Value in PostgresSQL

edited May 23 '17 at 12:24

Community

1
1

answered Oct 03 '12 at 16:16

Matt Ball

354,903
100
647
710

1

@Rock: md5 hashes are not unique. While a collision is very unlikely, there is no guarantee. Amy could end up with the same hash as Bob. – Erwin Brandstetter Oct 03 '12 at 17:27

score 1 · Answer 4 · answered Oct 04 '12 at 03:21

The exact case shown in your request can be produced with the dense_rank window function:

regress=# SELECT name, dense_rank() OVER (ORDER BY name) FROM test;
 name  | dense_rank 
-------+------------
 amy   |          1
 bob   |          2
 bob   |          2
 celia |          3
(4 rows)

so if you were adding a number for each row, you'd be able to do something like:

ALTER TABLE test ADD COLUMN some_num integer;

WITH gen(gen_name, gen_num) AS 
   (SELECT name, dense_rank() OVER (ORDER BY name) FROM test GROUP BY name)
UPDATE test SET some_num = gen_num FROM gen WHERE name = gen_name;

ALTER TABLE test ALTER COLUMN some_num SET NOT NULL;

however I think it's much more sensible to use a hash or to assign generated keys. I'm just showing that your example can be achieved.

The biggest problem with this approach is that inserting new data is a pain. It's a ranking (like your example shows) so if you INSERT INTO test (name) VALUES ('billy'); then the ranking changes.

Thank you for the answer Craig. At least this is another way to accomplish it and maybe useful in other cases. — Rock, Oct 04 '12 at 20:47

Factor (string) to Numeric in PostgreSQL

4 Answers4

Live demo at sqlfiddle.