2

I am in a strict Java environment.

So the question is not really as simple as in the tite, I am not trying to solve a problem I have, it is more theoretical for better knowledge.

What I am interested in is matching against src with either double or simple quote, but if it is double quote, it also has to be closed with a double quote, and same applies to simple quote.

I am aware of that i can repeat the regex in itself, ie:

String str = "src=\"hello/\" ... src='hello/' ..."

println str.replaceAll ("src=((\"[^\"]+\")|('[^']+'))", "src=$1")

What I would like to do is like:

println s.replaceAll ("src=([\"'][^\"']+[\"'])", "src=$1")

However, if it starts with double quote, then simple quotes should be allowed in the content, and it must end with a double quote, not a simple quote.

Question 2:

Is it possible to have it replaceAll with the same type of quote that was found? Is it possible to say, for this match, replace with this2, for that, replace with that2. How can you accomplish this without generating a new string each time?

Edit for Alan More, example for question two:

println "one ... two".replaceAll( "(one)", "1" ).replaceAll("(two)", "2");

more along these lines ( not right )

println "one ... two".replaceMyMatches( "(one)[^\\w]+(two)", "\$1{1}, \$2{2}" ) // prints string : one{1}, two{2} 

What I want is the string: 1, 2

Answer for first question derived and altered a bit from black panda and Jeff Walker:

String str = "src=\"1.png\" ... src='2.jpeg' ... src=\"3.p'ng\" ... src='4.jpe\"g' ... src='' ... src=\"\" ..." ;

String regex = "src=(['\"])(.+?)\\1"; // closes with the quote that is in group 1

println str.replaceAll( regex, '''src=$1../new_path/$2$1''')

Spits out:

src="../new_path/1.png" ... src='../new_path/2.jpeg' ... src="../new_path/3.p'ng" ... src='../new_path/4.jpe"g' ... src='' ... src="" ...

If one wants to replace the empty ones as well, just switch the + in the regex against a star ( I don't want that )

Notice the original quotes are in as well.

Answer question two see black panda

mjs
  • 21,431
  • 31
  • 118
  • 200

4 Answers4

2

The regex for question 1 is:

src=(['"])hello\1 (double backslash for a Java string)

It matches the first quote or double quote, then the same char as the first quote, using a backreference.

So for the more general case, I like:

^src=(['"])(.*?)\1$

Then the replacement might be something like:

String regex = "^src=(['\"])(.*?)\\1$";
String newthing = "src=$2";

Is this what you are wanting? Basicly to strip the quotes while enforcing them to match?

Due to an astute comment, I now understand that you want the quotes to escape eachother. Languages like Perl do that, but they aren't parsed via a regex. That type of thing belongs to a class of problems that require actual parsing. (can't remember the actual term)

Instead of a replacement, you would have to inspect group 2 and "assert" that group 1 doesn't exist. Noticed that I added beginning and ending anchors to the regex.

So something like:

Pattern p = Pattern.compile("^src=(['\"])(.*?)\\1$");
Matcher m = p.matcher("src=\"what's up?\"");
if ( m.matches() ) {
    if ( m.group(2).contains(m.group(1)) ) {
        // fail, doesn't match
    }
}
// success, follows all of the rules

I'm having trouble understanding what you are looking for in the second question, even with the update. I'll edit this answer if I get it.

Jeff Walker
  • 1,656
  • 1
  • 18
  • 36
  • The poster wants to be able to include quotes of the other kind in the string, though. He wants to be able to parse src='this is " some text' – black panda Jan 19 '12 at 15:02
  • 2
    Ah, I see now. I'm pretty sure that is not possible with a regex alone. Updating my answer again.... – Jeff Walker Jan 19 '12 at 15:12
  • 1
    Note that I said "with a regex alone". See my edited answer above. – Jeff Walker Jan 19 '12 at 15:30
  • Sorry, @Hamidam my first answer is incorrect. JeffWalker was correct. Please uncheck my answer as the best one so that I may delete it. Sorry about that. – black panda Jan 19 '12 at 16:48
2

My answer to question 1 was originally incorrect. Here's an updated version.

To answer question 1..See if this regex helps you: The pattern is:

src=(['"])(.*?)\1

The code below explains each piece.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex {

   public static void main(String[] args)
   {
      final String regex = "src=(['\"])" // the ' or the " is in group 1
              + "(.*?)" // match any character in a non-greedy fashion
              + "\\1"; // closes with the quote that is in group 1
      Pattern p = Pattern.compile(regex);

      Matcher m = p.matcher("src=\"hello/\"  ...   src='goodbye/'  ... "
              + "src='this has a \" in it'");

      while (m.find())
      {
         System.out.println("\nfound!");
         System.out.println("The quote was a " + m.group(1));
         System.out.println("the text was = " + m.group(2));
      }
   }
}

This gives the output:

found!
The quote was a "
the text was = hello/

found!
The quote was a '
the text was = goodbye/

found!
The quote was a '
the text was = this has a " in it

As for the second question, you'll have to use a little more code than that. You create your own StringBuffer and append as you go along. I used a map to hold the replacements:

   public static void question2()
   {
      Pattern p = Pattern.compile("one|two");
      Map<String, String> replacements = new HashMap<String, String>();

      replacements.put("one", "1");
      replacements.put("two", "2");

      StringBuffer result = new StringBuffer();

      String text = "one ... two";

      Matcher m = p.matcher(text);

      while (m.find())
      {
         m.appendReplacement(result, replacements.get(m.group()));
      }

      m.appendTail(result);

      System.out.println(result.toString());

   }

This outputs:

1 ... 2
black panda
  • 2,842
  • 1
  • 20
  • 28
  • Amazing! I am guessing question 2 is farfetched? – mjs Jan 19 '12 at 14:59
  • Have you run this? I'm having touble compiling your regex in part 1. I'll keep trying to see where my problem is. – Jeff Walker Jan 19 '12 at 15:31
  • Yes, I'm using JDK 6 on Netbeans 6.9.1 – black panda Jan 19 '12 at 15:39
  • That is what I am talkin' bout :P Great! Why all the backslashes in the java code, but not in the top part. I am guessing they aren't neccessary? – mjs Jan 19 '12 at 15:41
  • They are absolutely necessary! You have to escape backslashes in Java strings. The pattern was [^\\1]. Each one of those backslashes needs another backslash to escape it. – black panda Jan 19 '12 at 15:48
  • @blackpanda The \\\\1 part was not correct in the middle. Try with src="1.png" and you will se why. (.*?) works better. – mjs Jan 19 '12 at 16:13
  • You are absolutely right. I'm sorry. Please choose the @JeffWalker answer as the correct answer, so that I can delete my answer. – black panda Jan 19 '12 at 16:42
1

You could try something like this

String str = "src=\"hello/\" ... src='hello/' ...";

System.out.println(str.replaceAll("src=([\"'])(.*?)\\1", "src='$2'"));

The trick is to reuse the first matched pattern by using \1 in the very same regex

stryba
  • 1,979
  • 13
  • 19
0

For the first question you can use this regex:

"([\"'])(?:(?!\\1).)*\\1"

The second part doesn't have a pure regex solution--at least, not in Java. See this answer for the Java way. So, for example, if you had a table like this:

{ "one" => "1", "two" => "2" }

...your replacement() method would generate the dynamic parts of the replacement string by looking them up in the table, using the contents of the capturing groups as the keys.

Community
  • 1
  • 1
Alan Moore
  • 73,866
  • 12
  • 100
  • 156