0

I am currently going through a process of validating a column that holds reference numbers for each customer but they have to be a specific format to be counted as a genuine number.

I initially received around 80 csv files and uploaded this into a single master table after going through a process of making sure all tables have the same number of columns etc. (following is an example as I have over 90 columns).

The following is my create table query and I decided to keep everything as a varchar column bar the first ID column just to ensure there's no issues with data being missed.

CREATE TABLE IF NOT EXISTS `master_table` (
  `id` int(11) NOT NULL,
  `Name` varchar(255) ,
  `Business_Name` varchar(255) ,
  `Contact_Person` varchar(255) ,
  `Tel_Number` varchar(255) ,
  `Tel_Number_2` varchar(255) ,
  `Email` varchar(255) ,
  `House_Name` varchar(255) ,
  `Door_Number` varchar(255) ,
  `Street` varchar(255) ,
  `Town` varchar(255) ,
  `Locality` varchar(255) ,
  `County` varchar(255) ,
  `Post_Code` varchar(255) ,
  `Supplier` varchar(255) ,
  `Annual_Usage` varchar(255) ,
  `Start_Date` varchar(255) ,
  `End_Date` varchar(255) ,
  `PA_Nmbr` varchar(255)
  ;

I then uploaded all my data and this gave me around 900k rows of data.

The specific column that I am interested in is the PA_Nmbr column. For this to be a genuine reference number it needs to be 13 digit long so I then wrote the following query to see what my record count looks like:

SELECT
    SUM(CASE WHEN LEN <13 THEN CNT ELSE 0 END) AS UNDER_13,     -- 201,112
    SUM(CASE WHEN LEN =13 THEN CNT ELSE 0 END) AS PA_13,      -- 701,019
    SUM(CASE WHEN LEN >13 THEN CNT ELSE 0 END) AS OVER_13       -- 942
FROM
(
SELECT
    LENGTH(PA_NMBR) AS LEN,
    COUNT(*) AS CNT
FROM
    master_table
GROUP BY
    LENGTH(PA_NMBR)
) A
;

So this told me that around 70% of my data has the right length but the next check is to make sure when I have added the data into the database the numbers have not been rounded up. So for example if there was a genuine number 1234567899999 and this has been rounded up to 1234567890000.

SO where I am stuck is that I would like to write a query where I can do a count on how many 0s they are in the column from the right side.

So if I want to check quickly and do a count of how many records I have that start off with numbers i.e. 1234 and then all 0s I could quickly do that. That would tell me that there is a problem with the data itself.

I would really appreciate it if somebody could advice on how I could write this query.

Thanks in advance.

MT0
  • 143,790
  • 11
  • 59
  • 117
Michael Owen
  • 365
  • 3
  • 20
  • Your `create table` looks like MySQL, not Oracle. Please clarify what database you are really using. – Gordon Linoff Nov 20 '19 at 11:32
  • Hi Gordon, the original table sits in MySQL under a PHPMyAdmin environment but I use SQL Developer as my client as this is a tool that I am used to hence the confusion. – Michael Owen Nov 20 '19 at 11:35
  • So you want something MySQL compatible? – Nick Nov 20 '19 at 11:47
  • Hi Nick, not necessarily, the queries I have written above have been running in Oracle SQL so ideally would prefer it to be used in that as much prefer it. – Michael Owen Nov 20 '19 at 11:55

1 Answers1

0

have a look at the SELECT LEN(REPLACE('YYNYNYYNNNYYNY', 'N', '')) method. A similar question has been asked and answered here: How to count instances of character in SQL Column

Nick Ko
  • 435
  • 4
  • 16
  • HI Nick, thanks for this, works quite well. Is there anyway for it to check from the right rather than as a whole? – Michael Owen Nov 20 '19 at 12:16
  • Sorry, yes I forgot the "from-the-right" requirement here. In that case maybe you want to look into the Sub-string method? Have a look here, I think this may be helpful https://www.sqlshack.com/sql-substring-function-overview/ – Nick Ko Nov 21 '19 at 13:08
  • 1
    Thanks Nick, really appreciated – Michael Owen Dec 09 '19 at 15:35