1

I am looking for a way to detect and drop quotes with in quotes, for example: something "something "something something" something" something.

In the above example the italic something something is wrapped in double-quotes as you can see. I want to strip the string inside from these outer quotes.

So, the expression should simply look for quotes with a text between them plus a another set of text-wrapping text, and then drop the quotes wrapping the last.

This is my current code (php):

    preg_match_all('/".*(".*").*"/', $text, $matches);
    if(is_array($matches[0])){
        foreach($matches[0] as $match){
            $text = str_replace($match, '"' . str_replace('"', '', $match) . '"', $text);
        }
    }
Dewan159
  • 2,984
  • 7
  • 39
  • 43
  • 1
    Is there only one outer pair of quotes? – revo Jun 02 '18 at 08:48
  • 1
    This reads like you are trying to use a tool (regular expressions) for a task they are not suitable for. You might be able to find a solution for such tasks for a single iteration, but never a general solution. That is theoretically impossible, because regular expressions as a tool are not powerful enough for that. You need a more powerful device for that, a Turing machine. – arkascha Jun 02 '18 at 09:29
  • @revo No, as many as found – Dewan159 Jun 02 '18 at 09:44
  • @arkascha I understand, I know it's not a simple task, but trying to explain this to your boss is equally hard! The issue appear simple enough for some one to say "just drop the quotes in the middle"! – Dewan159 Jun 02 '18 at 09:45
  • Please add a sample in which more than one outer pair of quotes is being used. – revo Jun 02 '18 at 09:51
  • 1
    I am not talking about this not being a simple task. I wrote: it is theoretically impossible. Sure, you can go on trying the impossible. Men have tried to make gold from lead for hundreds of years. People have tried to find gods for thousands of years. But that does not really lead to any results, does it? So maybe it makes sense to listen to a chemistry professional for the gold issue or a university degree computer scientist about the issue of complexity. They say this is not possible and they can prove it in a mathematical sense. This is not possible to be done in a general sense. – arkascha Jun 02 '18 at 10:11
  • @arkascha: Why should it not be possible (see my answer) - or were you referring to a "regex-only solution" (where I'd agree). – Jan Jun 03 '18 at 18:43
  • @Jan Well, I understood that the OP is looking for a RegEx for that purpose. – arkascha Jun 03 '18 at 18:44
  • @arkascha: Ok then, this is hardly possible, I guess. – Jan Jun 03 '18 at 18:45

2 Answers2

1

If the string starts with a " and the double quotes inside the string are always balanced you might use:

^"(*SKIP)(*F)|"([^"]*)"

That would match a double quote at the start of the string and then skips that match using SKIP FAIL. Then it would match ", capture in a group what is between the " and match a " again.

In the replacement you could use capturing group 1 $1

$pattern = '/^"(*SKIP)(*F)|"([^"]+)"/';
$str = "\"something \"something something\" and then \"something\" something\"";
echo preg_replace($pattern, "$1", $str); 

"something something something and then something something"

Demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Actually the quotes may or not be at the beginning of the text, or else it would be a lot easier. Thanks – Dewan159 Jun 02 '18 at 09:43
1

You could leverage strpos() with the third parameter (offset) to look up all quotes and replace every quote from 1 to n-1:

<?php

$data = <<<DATA
something "something "something something" something" something
DATA;

# set up the needed variables
$needle = '"';
$lastPos = 0;
$positions = array();

# find all quotes
while (($lastPos = strpos($data, $needle, $lastPos)) !== false) {
    $positions[] = $lastPos;
    $lastPos = $lastPos + strlen($needle);
}

# replace them if there are more than 2
if (count($positions) > 2) {
    for ($i=1;$i<count($positions)-1;$i++) {
        $data[$positions[$i]] = "";
    }
}

# check the result
echo $data;
?>

This yields

something "something something something something" something


You could even hide it in a class:
class unquote {
    # set up the needed variables
    var $data = "";
    var $needle = "";
    var $positions = array();

    function cleanData($string="", $needle = '"') {
        $this->data = $string;
        $this->needle = $needle;
        $this->searchPositions();
        $this->replace();
        return $this->data;
    }

    private function searchPositions() {
        $lastPos = 0;
        # find all quotes
        while (($lastPos = strpos($this->data, $this->needle, $lastPos)) !== false) {
            $this->positions[] = $lastPos;
            $lastPos = $lastPos + strlen($this->needle);
        }
    }

    private function replace() {
        # replace them if there are more than 2
        if (count($this->positions) > 2) {
            for ($i=1;$i<count($this->positions)-1;$i++) {
                $this->data[$this->positions[$i]] = "";
            }
        }

    }
}

And call it with

$q = new unquote();
$data = $q->cleanData($data);
Jan
  • 42,290
  • 8
  • 54
  • 79