0

I tried searching for this for a long time, but all answers don't really get me anywhere.

I'm trying to write a SQL query where I need to map certain values to new groups with wildcards, in between certain boundaries This would look something like this:

SELECT number,
CASE
    WHEN number >= LIKE '0' AND number <= LIKE '009%' THEN 'group 1'
    WHEN number >= LIKE '010%' AND number <= LIKE '027%' THEN 'group 2'
    ELSE '0'
END AS NEW_GROUPS

This is necessary because numbers can be like 00923 and 00811 and they will need to be in the first category. As 010.123, 010123 and 0270 need to be in the second.

If something like this isn't really possible, then it is also an option to use the map method in Python with a dictionary (something like:

df['number'].map({..})

But I am not sure how to use the lambda/regex/wildcard here. Help is greatly appreciated!

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Jordy
  • 39
  • 1
  • 4
  • Which SQL database you are using? String formatting is very different even between versions of the "same" RDBMS. In case you are *really* trying to find general solution - write at least which standard (SQL-89/ SQL-92/../SQL-2016) you want – Alex Yu Jan 08 '19 at 14:53
  • Instead of making such kind of SQL queries; use simple query to retrieve information and the use pandas to perform data manipulation. This would be an ideal solution when you have tons of data. In such scenario, complex sql queries would take longer time. https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column – JR ibkr Jan 08 '19 at 16:23

2 Answers2

0

If number is actually a numeric column then you can do direct comparisons. Otherwise I would use something like the following to get what you desire. It is a bit different but I think it hits what you need

case
    When LEN(number)<3 or Cast(Left(Number,3) as Int) < 10 Then 'Group 1'
    When Cast(Left(Number,3) as Int) < 28 Then 'Group 2'

You may need to tweek for edge cases but I think this gets the general idea.

Breian Wells
  • 111
  • 3
0

Are you looking for something like this?

SELECT number,
       (CASE WHEN number >= '00' AND number < '01' THEN 'group 1'
             WHEN number >= '01' AND number <= '027' THEN 'group 2'
             ELSE '0'
        END) as new_group

The first group will be numbers that start with "00". The second will be numbers that start with values that are alphabetically in that range. "Alphabetically" means that '01A' would meet the conditions.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I like your solution better than mine – Breian Wells Jan 08 '19 at 15:19
  • Ah this might do it actually. I was thinking way too difficult. In my mind the number 00123 would translate to 123, but these are strings of course. I will try, thank you – Jordy Jan 08 '19 at 15:28
  • Seriously man, you would recommend running such queries? Though it is correct answer to this problem. It's not an optimal answer. – JR ibkr Jan 08 '19 at 16:27
  • @JRibkr . . . I have no idea what you are referring to. I would use a query like this if the logic is indeed correct. – Gordon Linoff Jan 08 '19 at 16:37
  • If you were in OP's shoes and targeted database had huge amount of records; would you go with your solution ? Correct way to deal with this problem would be to use simple query to load data into pandas dataframe and then manipulate it as needed. – JR ibkr Jan 08 '19 at 16:45
  • @JRibkr . . . That is a ridiculous suggestion. Of course I would do the work in the database. It is more efficient, and databases are designed for that type of processing. If you are proficient in pandas, that's fine, but SQL is an eminently reasonable solution for this. – Gordon Linoff Jan 08 '19 at 19:05