21

Using C#, I have a string that is a SQL script containing multiple queries. I want to remove sections of the string that are enclosed in single quotes. I can do this using Regex.Replace, in this manner:

string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, "'[^']*'", string.Empty);

Results in: "Only can we turn him to the of the Force"

What I want to do is remove the substrings between quotes EXCEPT for substrings containing a specific substring. For example, using the string above, I want to remove the quoted substrings except for those that contain "dark," such that the resulting string is:

Results in: "Only can we turn him to the 'dark side' of the Force"

How can this be accomplished using Regex.Replace, or perhaps by some other technique? I'm currently trying a solution that involves using Substring(), IndexOf(), and Contains().

Note: I don't care if the single quotes around "dark side" are removed or not, so the result could also be: "Only can we turn him to the dark side of the Force." I say this because a solution using Split() would remove all the single quotes.

Edit: I don't have a solution yet using Substring(), IndexOf(), etc. By "working on," I mean I'm thinking in my head how this can be done. I have no code, which is why I haven't posted any yet. Thanks.

Edit: VKS's solution below works. I wasn't escaping the \b the first attempt which is why it failed. Also, it didn't work unless I included the single quotes around the whole string as well.

test = Regex.Replace(test, "'(?![^']*\\bdark\\b)[^']*'", string.Empty);
eebbesen
  • 5,070
  • 8
  • 48
  • 70
armus47
  • 237
  • 1
  • 2
  • 6
  • 13
    @AndyKorneyev What makes you think this isn't a good way to ask a question here? This is one of the more complete first posts I've seen in a while. There's a good attempt, with regex, the problem is made clear, there are a few ideas, I don't really see how this could possibly be better, less including the actual answer. – Matthew Haugen Jan 23 '15 at 08:22
  • 7
    @AndyKorneyev Op's two line of his code shows his effort isn't it? Also the question is well written and shows very good research effort. – Sriram Sakthivel Jan 23 '15 at 08:23
  • 2
    @SriramSakthivel that two lines is not about OP wants. It is about some "preliminary task". But actual attempt only described as "*I'm currently trying a solution that involves using Substring(), IndexOf(), and Contains().*" without showing any code. – Andrey Korneyev Jan 23 '15 at 08:25
  • 4
    @AndyKorneyev If op knows what OP wants why he should be here in first place? Ok. that code is not shown here; I agree, OP should have posted it. but what do you mean by this *This is not a good way to ask a question here. Did you try anything so far to solve your problem?* Huh? it seems like commom template you copy pasted which doesn't fit this question at all. – Sriram Sakthivel Jan 23 '15 at 08:27
  • 6
    @AndyKorneyev I don't understand why my post is not a good way to ask a question here, can you elaborate on why? I tried a number of things using Regex.Replace, but I either removed all delimited substrings, the remainder of the string after the first delimiter, or no change. I included the result that got me closest to what I want, which is my effort on this problem. I don't know how else to approach this problem which is why I'm asking for advice on this website. Thanks for your help. – armus47 Jan 23 '15 at 08:30
  • @armus47 Your question has received 9 upvotes (and counting) and only 1 downvote. You wrote an excellent question, so I wouldn't worry about it. If every question you write shows this level of research, you'll do very well here. – Matthew Haugen Jan 23 '15 at 08:34
  • 1
    @armus47 Disregard that comment. As I said your question is well written and shows very good research effort. Don't trust me? Well upvotes for your question will tel the quality of your question. Only flaw I can see is that you failed to post the code which you attempted with *Substring(), IndexOf(), etc* Next time better post what you tried(even when it doesn't works). – Sriram Sakthivel Jan 23 '15 at 08:34
  • 1
    @armus47, Well, the most confusing part of your question was "*I'm currently trying a solution that involves using Substring(), IndexOf(), and Contains().*" - just description without showing your actual attempt code. For me it looked like 'huh... a tried something I don't want to show, but failed - so please write a code for me". Probably copy-pasting template for "extremelly offtopic" questions was not a good idea and I should write some better-fit comment, agreed. – Andrey Korneyev Jan 23 '15 at 08:34
  • [Based on this similar duplicate](http://stackoverflow.com/questions/6372065/regex-replace-replace-only-first-one-found) you can use the following overload for [Replace](https://msdn.microsoft.com/en-us/library/haekbhys.aspx) - The 1 helps apparently – Sayse Jan 23 '15 at 08:35
  • 7
    Finding single quotes in SQL is usually a sign that you're doing it wrong: parameterization is the way to approach that problem. – Marc Gravell Jan 23 '15 at 08:35
  • 1
    @Sayse this is not a duplicate of what you suggest. The answer from the link you posted doesn't apply here (not the same conditions). – Lucas Trzesniewski Jan 23 '15 at 08:42
  • @LucasTrzesniewski - The answer is the same however (correct too) - [IDEOne](https://ideone.com/w0AmCc) – Sayse Jan 23 '15 at 08:43
  • 2
    @Sayse this works for this specific example by coincidence (the first quoted string is the only to be removed). Try to invert the quoted strings and see the result. The requirements are different. – Lucas Trzesniewski Jan 23 '15 at 08:46
  • @LucasTrzesniewski - I guess thats down to the OP as to what works in the way he requires, I'd argue that the inclusion of `dark` only works for this specific example but they seem happy with it :) – Sayse Jan 23 '15 at 08:51
  • @Sayse and Lucas: Thanks guys for the help, in this case my example was not complete, but just a stand in. As I said in the post, it's really being used on a SQL file containing multiple queries, so there are numerous substrings to deal with. – armus47 Jan 23 '15 at 09:04

5 Answers5

23
'(?![^']*\bdark\b)[^']*'

Try this.See demo.Replace by empty string.You can use lookahead here to check if '' contains a word dark.

https://www.regex101.com/r/rG7gX4/12

vks
  • 67,027
  • 10
  • 91
  • 124
  • That's an awesome website, but I couldn't get it to work in my C# application. I used the feature "code generator" and replicated it, but it didn't have any effect on the string. I will need to read more on Regex to understand the syntax well enough to translate it I think. Thanks! – armus47 Jan 23 '15 at 08:51
  • That works! Sort of, I had to still include the single quotes enclosing the entire regex string, which you omitted in your comment. I'll put it in my post. Thanks – armus47 Jan 23 '15 at 08:57
  • 6
    @armus47 it's better to use a verbatim string in this case, so you don't have to escape the backslashes: `Regex.Replace(test, @"'(?![^']*\bdark\b)[^']*'", string.Empty)` – Lucas Trzesniewski Jan 23 '15 at 08:58
16

While vks's solution works, I'd like to demonstrate a different approach:

string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, @"'[^']*'", match => {
    if (match.Value.Contains("dark"))
        return match.Value;

    // You can add more cases here

    return string.Empty;
});

Or, if your condition is simple enough:

test = Regex.Replace(test, @"'[^']*'", match => match.Value.Contains("dark")
    ? match.Value
    : string.Empty
);

That is, use a lambda to provide a callback for the replacement. This way, you can run arbitrary logic to replace the string.

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • 1
    This works perfectly and is the answer I was looking for. I couldn't get vks's solution to work, but that's a pretty nifty site. Vignesh's solution is the approach I was trying to work out that doesn't use Regex. But I like this solution because it uses Regex AND additional logic for an easy-to-understand approach. Thanks! – armus47 Jan 23 '15 at 08:47
4

some thing like this would work.
you can add all strings you want to keep into the excludedStrings array

        string test = "Only 'together' can we turn him to the 'dark side' of the Force";

        var excludedString = new string[] { "dark side" };

        int startIndex = 0;

        while ((startIndex = test.IndexOf('\'', startIndex)) >= 0)
        {
            var endIndex = test.IndexOf('\'', startIndex + 1);
            var subString = test.Substring(startIndex, (endIndex - startIndex) + 1);
            if (!excludedString.Contains(subString.Replace("'", "")))
            {
                test = test.Remove(startIndex, (endIndex - startIndex) + 1);
            }
            else
            {
                startIndex = endIndex + 1;
            }
        }
Vignesh.N
  • 2,618
  • 2
  • 25
  • 33
  • 2
    This is the approach I had in mind when I couldn't get Regex to work. I impressed by how quickly you put that together. Thanks! – armus47 Jan 23 '15 at 08:53
2

Another method through regex alternation operator |.

@"('[^']*\bdark\b[^']*')|'[^']*'"

Then replace the matched character with $1

DEMO

string str = "Only 'together' can we turn him to the 'dark side' of the Force";
string result = Regex.Replace(str, @"('[^']*\bdark\b[^']*')|'[^']*'", "$1");
Console.WriteLine(result);

IDEONE

Explanation:

  • (...) called capturing group.

  • '[^']*\bdark\b[^']*' would match all the single quoted strings which contains the substring dark . [^']* matches any character but not of ', zero or more times.

  • ('[^']*\bdark\b[^']*'), because the regex is within a capturing group, all the matched characters are stored inside the group index 1.

  • | Next comes the regex alternation operator.

  • '[^']*' Now this matches all the remaining (except the one contains dark) single quoted strings. Note that this won't match the single quoted string which contains the substring dark because we already matched those strings with the pattern exists before to the | alternation operator.

  • Finally replacing all the matched characters with the chars inside group index 1 will give you the desired output.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

I made this attempt that I think you were thinking about (some solution using split, Contain, ... without regex)

string test = "Only 'together' can we turn him to the 'dark side' of the Force";
string[] separated = test.Split('\'');

string result = "";

for (int i = 0; i < separated.Length; i++)
{
    string str = separated[i];
    str = str.Trim();   //trim the tailing spaces

    if (i % 2 == 0 || str.Contains("dark")) // you can expand your condition
    {
       result += str+" ";  // add space after each added string
    }
}
result = result.Trim(); //trim the tailing space again
chouaib
  • 2,763
  • 5
  • 20
  • 35