0

Issue: The MIDL column is supposed to include (UL_ID, "x001"), x001 representing a unique household. If there are duplicated households, then the base of the MIDL should stay the same, while the second value increases by 1 (which means there is more than one household on the said address). Now, I managed to cheat with numbers up to 9 for a while, because I can set the base as (UL_ID, "x00") and simply add values for each additional duplicate, but once you go over 9, you end up with x0010, while the value can't exceed 12 characters. NOTE: The number of households extends to 60 in some cases.

Before:

ADDRESS                    MIDL              UL_ID
---------------------------------------------------
AŠKERČEVA ULICA 6                          14044272
AŠKERČEVA ULICA 6                          14044272
AŠKERČEVA ULICA 6                          14044272
AŠKERČEVA ULICA 6A                         14044264
AŠKERČEVA ULICA 7                          14044299
BAZOVIŠKA ULICA 25                         14037390
BELOKRIŠKA CESTA 10                        17097687
BELOKRIŠKA CESTA 10                        17097687
BELOKRIŠKA CESTA 10                        17097687
BELOKRIŠKA CESTA 10                        17097687

After:

ADDRESS                    MIDL              UL_ID
----------------------------------------------------
AŠKERČEVA ULICA 6       14044272x001        14044272
AŠKERČEVA ULICA 6       14044272x002        14044272
AŠKERČEVA ULICA 6       14044272x003        14044272
AŠKERČEVA ULICA 6A      14044264x001        14044264
AŠKERČEVA ULICA 7       14044299x001        14044299
BAZOVIŠKA ULICA 25      14037390x001        14037390
BELOKRIŠKA CESTA 10     17097687x001        17097687
BELOKRIŠKA CESTA 10     17097687x002        17097687
BELOKRIŠKA CESTA 10     17097687x003        17097687
BELOKRIŠKA CESTA 10     17097687x004        17097687

UPDATE

P.Salmon thought up of a working solution

SELECT T.ADDRESS,
     CONCAT(T.UL_ID,'x' ,lpad(IF(T.UL_ID <> @P,@RN:=1,@RN:=@RN+1),3,'0')) MIDL, T.UL_ID,
     @P:=T.UL_ID P
FROM (SELECT @RN:=0,@P:=0) RN,T
Adephx
  • 187
  • 10

2 Answers2

1

If you have "managed to cheat with numbers up to 9 for a while, because I can set the base as (UL_ID, "x00") and simply add values for each additional duplicate" using a cursor you could instead use a rownumber as suggested by @Tim Biegeleisen something like this

 SELECT T.ADDRESS,
         CONCAT(T.UL_ID,'x' ,lpad(IF(T.UL_ID <> @P,@RN:=1,@RN:=@RN+1),3,'0')) midl,
         T.UL_ID,
         @P:=T.UL_ID P
FROM    (SELECT @RN:=0,@P:=0) RN,T
ORDER   BY  T.UL_ID

If you already using this method you may just need the concat..lpad bits.

Same thing using address

SELECT T.ADDRESS,
         CONCAT(T.UL_ID,'x' ,lpad(IF(T.address <> @P,@RN:=1,@RN:=@RN+1),3,'0')) midl,
         T.UL_ID,
         @P:=T.address P
FROM    (SELECT @RN:=0,@P:='') RN,T
ORDER   BY  T.address

Result

+----------+--------------+----------+----------+
| ADDRESS  | midl         | UL_ID    | P        |
+----------+--------------+----------+----------+
| CESTA 10 | 17097687x001 | 17097687 | CESTA 10 |
| CESTA 10 | 17097687x002 | 17097687 | CESTA 10 |
| CESTA 10 | 17097687x003 | 17097687 | CESTA 10 |
| CESTA 10 | 17097687x004 | 17097687 | CESTA 10 |
| ULICA 25 | 14037390x001 | 14037390 | ULICA 25 |
| ULICA 6  | 14044272x001 | 14044272 | ULICA 6  |
| ULICA 6  | 14044272x002 | 14044272 | ULICA 6  |
| ULICA 6  | 14044272x003 | 14044272 | ULICA 6  |
| ULICA 6A | 14044264x001 | 14044264 | ULICA 6A |
| ULICA 7  | 14044299x001 | 14044299 | ULICA 7  |
+----------+--------------+----------+----------+
10 rows in set (0.02 sec)
P.Salmon
  • 17,104
  • 2
  • 12
  • 19
  • If I am not mistaken, this will only work if the table is ordered by UL_ID – krtek Nov 02 '16 at 11:02
  • Could you take a minute of your time and explain to me, why this method doesn't work if I decide to count the ADDRESS while still using the UL_ID as reference. In my understanding it should essentially work the same, except it doesn't. Instead it counts up until 999 (it's limited by 3, I get this part) and resets. I fail to see why it wouldn't work, as it is only supposed to compare the duplicates. Does it has anything to do with characters, as the UL_ID is composed strictly out of numbers? – Adephx Nov 02 '16 at 11:37
  • Adephx - not sure I understand your comment. if you you post your code on sql fiddle I'll take a look at it. Note - if you run 2 bits of code in the same script which use the same @ variables you will get an issue where the second query just keeps incrementing the row number - I don't know why this is but you could try SETting the variables between the queries or use different variable names in the second query. – P.Salmon Nov 02 '16 at 14:19
  • Sorry for not being clear enough. In your code, I can replace UL_ID with any other parameter in my table (I have several others that I haven't listed here) and the code will work perfectly (it will count the number of duplicates in that column, attach either UL_ID or something else as prefix then add the count as x00~), but if I select ADDRESS as the column that is supposed to be counted the code breaks. TLDR: I wanted to replace UL_ID with ADDRESS – Adephx Nov 03 '16 at 07:31
  • Adephx I have added query based on address - looks ok to me – P.Salmon Nov 03 '16 at 09:01
  • Yeah, looks ok to me as well, weird. I tried exactly the same method yesterday and it gave the wrong result. Cheers, thank you for taking your time, I wish I could upvote your reply, but my rep is under 15. – Adephx Nov 03 '16 at 11:07
0

In order to get the count, you will need a subquery. Something like this should do the trick:

SELECT 
    t.ADDRESS,
    CONCAT(t.UL_ID, "x", LPAD(count.UL_ID_COUNT, 3, '0')) AS MIDL,
    t.UL_ID
FROM <your table> AS t
INNER JOIN (
    SELECT UL_ID, COUNT(UL_ID) AS UL_ID_COUNT FROM <your table> GROUP BY UL_ID
) AS count ON count.UL_ID = t.UL_ID;

I am also using LPAD and CONCAT to create MIDL.

You can also create a MySQL UPDATE using the same idea:

UPDATE <your table> AS t
SET MIDL=CONCAT(t.UL_ID, "x", LPAD(count.UL_ID_COUNT, 3, '0'))
INNER JOIN (
    SELECT UL_ID, COUNT(UL_ID) AS UL_ID_COUNT FROM <your table> GROUP BY UL_ID
) AS count ON count.UL_ID = t.UL_ID;
krtek
  • 26,334
  • 5
  • 56
  • 84
  • I had to remove _COUNT from count.UL_ID_COUNT to make it work, but then all the values return as UL_IDx162. The solution from P.Salmon seems to work. – Adephx Nov 02 '16 at 10:52
  • My bad, I forgot to name the columns in the subquery :) It should work now. – krtek Nov 02 '16 at 10:59
  • I've checked your solution again, and while it works for counting the total amount of duplicated rows, for example 4 rows with UL_ID 17097687 will be marked all as 17097687x004, it doesn't work for numbering those rows as needed, example: 17097687x001, 17097687x002, 17097687x003 and 17097687x004. As mentioned, the solution from P.Salmon works, but I'm grateful for your help non the less. – Adephx Nov 02 '16 at 11:15
  • Oh, ok, I misread the question, sorry ;) Glad the other answers work for you ! – krtek Nov 02 '16 at 11:36