0

I have a column within my MYSQL database headline TEXT NOT NULL I would like to go through this column for each of my records and extract the URL if it has one. The URL will be stored in another column article_url VARCHAR (225) NULL. If there is no URL then it would just put NULL in the column.

Lastly, I would like this to update each time a new record is inserted into the database. Below is what I have thus far.

UPDATE: here is an example of the headline column of my data Drastic decline in Social Sciences intake *: http:// bit.ly/2vXzPhQ pic.twitter.com/PAZvG3tX17 as you can see there is a URL in the data set however the URL is broken, all URLs are broken with a space between http:// and the rest of the URL. So I need to get a function in PHP that can find the Valid or Invalid URL when it gets the Invalid URL it will fix it.

this is what I have thus far.

$result = $conn->query($sql);

$reg_exp = "/^(http(s?): \/\/)?(www\.)+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$/";

if ($result->num_rows > 0) 
{
    // output data of each row
    while($row = $result->fetch_assoc()) 
    {
        echo "id: " . $row["id"]. " headline: " . $row["headline"]. "<br>";
        if(preg_match($reg_exp, $row["headline"]) == TRUE)
        {
            echo "URL is valid format";
        }
        else
        {
            echo "URL is invalid format";
        }
    }
} 
else 
{
    echo "0 results";
}
Matt
  • 1,749
  • 2
  • 12
  • 26
Kern Elliott
  • 1,659
  • 5
  • 41
  • 65
  • 1
    Need more information - when you say extract, from what - the column headline? What is stored in headline? Have you attempted this at all? If so post your code! – Matt Aug 09 '17 at 21:07
  • @Matt haven't tried anything because I don't know where to start. The column has text within it like a tweet and sometimes a url. – Kern Elliott Aug 09 '17 at 21:46
  • So you are saying the headline is a html string so will contain html like Link and you want to extract the url form this? Or it can be in multiple formats? Where does headline come from? – Matt Aug 09 '17 at 21:49
  • @Matt no am saying a headline is a tweet so its 140 characters that can possibly have a url within it and I would like to extract the URL from that body of text. – Kern Elliott Aug 09 '17 at 21:55
  • Okay.. sounds like you want to use regex one the string. Maybe try `preg_match_all('!https?://\S+!', $string, $matches);` let me know how you get on. If you need more info i'll put it in an answer – Matt Aug 09 '17 at 21:58
  • Matt answered the php-Way, This would take to retrieve all values in php, do the code and update back to the MySQL table. – Myonara Aug 10 '17 at 19:41
  • @Matt going to update my question – Kern Elliott Aug 15 '17 at 15:11

1 Answers1

0

For a MySQL only solution you have four parts to parts solve:

  1. one to run as update on existing data entries

  2. one to run as AFTER INSERT Trigger

  3. For both 1+2 you need the correct REGEX-Syntax of MySQL. The php-regex might be a good starting point to do a SELECT headline, Headline REGEXP '(insert your regex here)' FROM table;. with that SELECT you can iterative check on your database, if your regex is correct.

  4. In this answer you find a version of regexp_replace, which you can add to your database. This can be used in the UPDATE table SET artice_url = IF(...REGEXP...) regexp_replace(headline,...) (the ... needs also to be filled correctly) as well as in the Trigger.

Myonara
  • 1,197
  • 1
  • 14
  • 33