0

I have a column in my table that contains a text record of some large logging data. These fields will have a 9 digit number (0-9) that starts with "3".

I simply want to select only the pattern from this field, not the whole field. And to make things complicated, this pattern can be in a field more than once.

I think the REGEX I need is 3{8}[0-9] this?

Is there a MySQL way only to do this? I'd rather not have to write a php script to extract this data.

EDIT: It seems this is not possible with REGEX - can it be done with any of the other MySQL String functions?

TheLettuceMaster
  • 15,594
  • 48
  • 153
  • 259
  • mysql regexes can only MATCH. they cannot do capturing or replacement. – Marc B Jun 08 '15 at 14:44
  • so it is going to be `gobblygook four scoreandsevenyears847394057 agoour847394057fore fathers666brought uponthis continent` – Drew Jun 08 '15 at 14:45
  • @DrewPierce Yes, it would look like that. Its's actually logging from php's `print_r`. And some of the values in the array's are the 9 digit numbers. – TheLettuceMaster Jun 08 '15 at 14:46
  • only 1 9digit number in there (on a given row shall we call it)? – Drew Jun 08 '15 at 14:49
  • [Here it is said you cannot do that](http://stackoverflow.com/q/4021507/3832970). – Wiktor Stribiżew Jun 08 '15 at 14:50
  • @DrewPierce No, it could be multiple. But it looks like based on the link above this is not possible. – TheLettuceMaster Jun 08 '15 at 14:52
  • give an example of the `actual` data, not caring if there are more than one 9-digit number for any given row. it is important to know how the chunk begins and ends, that is, characters surrounding it – Drew Jun 08 '15 at 15:49

3 Answers3

2

I don't think MySQL or SQL in general is the right tool for text mining when dealing with non-normalized data.

Just

$ mysql_dump mydb mytable > dump.sql

your data into a file and then search for your pattern using

$ grep -o '3[0-9]\{8\}' dump.sql > numbers.txt
  • -o tells grep to only display matched data.
  • 3 and [0-9] are patterns to match 3 and any number between 0-9
  • \{8\} is the escaped form of {8} telling grep that the previous pattern should match exactly 8 times

Final command from the discussion that also expects a non-numeric value after the 9 digits:

$ grep -Po '3[0-9]{8}(?=[^0-9])' dump.sql > numbers.txt
  • uses perl regexp so no escaping is needed
  • (?=...) is a lookahead that matches, but is not included in the result
Basti
  • 3,998
  • 1
  • 18
  • 21
  • any thoughts on how to deal with (exclude) 'eleven digits in a row starting with 3 all together31111111111yes together' .... as it relates to his question – Drew Jun 08 '15 at 15:56
  • Excluding things in regular expressions is not that easy. A simple method would be to assume a 3, followed by 8 numbers, followed by something other than a number: `$ grep -o '3[0-9]\{8\}[^0-9]' dump.sql > numbers.txt`. – Basti Jun 08 '15 at 16:02
  • maybe i should rephrase. must start with 3, have 8 #'s following, but not a total of 9 or more following the 3, so that it eliminates unwanted junk – Drew Jun 08 '15 at 16:04
  • The previous solution should work. Beware that you'll get numbers containing that last non-digit because it is part of the pattern. To avoid this you can use a look-ahead, but you'll need the GNU grep, not the OS X grep (as I have). The command should look like `$ grep -Po '3[0-9]{8}(?=[^0-9])' dump.sql > numbers.txt`. – Basti Jun 08 '15 at 16:14
0

These fields will have a 9 digit number (0-9) that starts with "3".

Here's 3 cases to show the regexp and that it fulfills that specification:

+----------------------------------+----------------------------------+---------------------------------------+
| '123456789' REGEXP '^3[0-9]{8}$' | '323456789' REGEXP '^3[0-9]{8}$' | 'junk-323456789' REGEXP '^3[0-9]{8}$' |
+----------------------------------+----------------------------------+---------------------------------------+
|                                0 |                                1 |                                     0 |
+----------------------------------+----------------------------------+---------------------------------------+
Rick James
  • 135,179
  • 13
  • 127
  • 222
0

As others have mentioned, SQL isn't especially suited to poking around inside of text. But you could do something as a stored function. Here's one that will grab the first positive integer out of a text value. That might be good enough if you don't have other numbers preceding the pattern you want; or you could modify it fairly easily to get the more precise pattern you want (for example, test each number it finds, and if it's not of the form you want, discard it and keep looking instead of exiting the loop):

DROP FUNCTION IF EXISTS firstNumber;

DELIMITER //
CREATE FUNCTION firstNumber(s TEXT)
    RETURNS INTEGER
    COMMENT 'Returns the first integer found in a string'
    DETERMINISTIC
    BEGIN

    DECLARE token TEXT DEFAULT '';
    DECLARE len INTEGER DEFAULT 0;
    DECLARE ind INTEGER DEFAULT 0;
    DECLARE thisChar CHAR(1) DEFAULT ' ';

    SET len = CHAR_LENGTH(s);
    SET ind = 1;
    WHILE ind <= len DO
        SET thisChar = SUBSTRING(s, ind, 1);
        IF (ORD(thisChar) >= 48 AND ORD(thisChar) <= 57) THEN
            SET token = CONCAT(token, thisChar);
        ELSEIF token <> '' THEN
            -- Could add extra pattern check here
            SET ind = len + 1;
        END IF;
        SET ind = ind + 1;
    END WHILE;

    IF token = '' THEN
        RETURN 0;
    END IF;

    RETURN token;

    END //
DELIMITER ;
TextGeek
  • 1,196
  • 11
  • 23