tl;dr
Use raw String
s for creating Regex
:
""""(.?(\\")?)*?"""".toRegex()
As you've written yourself, you need to escape some characters to actually get the regular expression you're looking for.
Disregarding any special characters that need escaping, I assume you try to reach the following pattern: "(.?(\")?)*?"
.
To have an actual backslash character in your regular expression as literal character, you have to write four backslashes, like in Java.
This is because the backslash is both an escape character for regular String
s as well as in Regex
s.
The expression "\\"
yields a string containing a single backslash. However, to get a literal backslash character in a regular expression, you have to escape it with another backslash character.
That is:
The expression "\\\\"
turns into a String
containing of two '\'
characters, that is "\\"
.
The String
"\\"
turned into a Regex
becomes a regular expression containing a single backslash literal \
.
You can see this more clearly by executing the following code:
println("\\\\") // String with two backslash characters
println("\\\\".toRegex()) // Regex with single backslash literal
println("\\") // String with single backslash characters
println("\\".toRegex()) // Exception in thread "main" java.util.regex.PatternSyntaxException: Unexpected internal error near index 1 \
In general I'd recommend to use raw strings, whenever creating a regular expression in Kotlin. They're delimited by triple quotes ("""
) instead of single quotes ("
).
In raw string literals, the backslash character is not an escape character, thus you neither have to escape it (for the String) nor the single quotes.
Instead of "\"(.?(\\\\\")?)*?\""
you can write """"(.?(\\")?)*?""""
.
Additionally, you can use the extension function
fun String.toRegex(): Regex
, to convert your String
to a Regex
object, but that's just a question of preference.
All in all, your code could look like:
val fileContents = """normal "commen\"t" end"""
val regex = """"(.?(\\")?)*?"""".toRegex()
val comments = fileContents.replace(regex, "")
println(comments) // prints "normal end"