0

For processing a file with SQL statements such as:

ALTER TABLE ONLY the_schema.the_big_table
    ADD CONSTRAINT the_schema_the_big_table_pkey PRIMARY KEY (the_id);

I am using the regex:

 val primaryKeyConstraintNameCatchingRegex: Regex = "([a-z]|_)+\\.([a-z]|_)+\n\\s*(ADD CONSTRAINT)\\s*([a-z]|_)+\\s*PRIMARY KEY”.r

Now the problem is that this regex does not return any results, despite the fact that both the regex

val alterTableRegex = “ALTER TABLE ONLY\\s+([a-z]|_)+\\.([a-z]|_)+”.r

and

val addConstraintRegex = “ADD CONSTRAINT\\s*([a-z]|_)+\\s*PRIMARY KEY”.r

match the intended sequences.

I thought the problem could be with the new line, and, so far, I have tried writing \\s+, \\W+, \\s*, \\W*, \\n*, \n*, \n+, \r+, \r*, \r\\s*, \n*\\s*, \\s*\n*\\s*, and other combinations to match the white space between the table name and add constraint to no avail.

I would appreciate any help with this.

Edit

This is the code I am using:

import scala.util.matching.Regex
import java.io.File

import scala.io.Source


object Hello extends Greeting with App {

  val primaryKeyConstraintNameCatchingRegex: Regex = "([a-z]|_)+\\.([a-z]|_)+\r\\s*(ADD CONSTRAINT)\\s*([a-z]|_)+\\s*PRIMARY KEY".r


  readFile

  def readFile: Unit = {
    val fname = "dump.sql"
    val fSource = Source.fromFile(fname)


    for (line <- fSource.getLines) {
      val matchExp = primaryKeyConstraintNameCatchingRegex.findAllIn(line).foreach(
        segment => println(segment)
      )
    }

    fSource.close()


  }
}

Edit 2

Another strange behavior is that when matching with

"""[a-z_]+(\.[a-z_]+)\s*A""”.r

the matches happen and they include A, but when I use

"""[a-z_]+(\.[a-z_]+)\s*ADD""”.r

which is only different in DD, no sequence is matched.

piet.t
  • 11,718
  • 21
  • 43
  • 52
zmerr
  • 534
  • 3
  • 18
  • 1
    Can you show the related code? What is the expected output? Matching whitespace is done with `\s`, so `\n\\s*` must be written as `\\s*`. Or, you can try using `\R` instead of `\n`, ``\\R\\s*`` if you need to make sure there is a line break. – Wiktor Stribiżew Sep 06 '21 at 09:10
  • @WiktorStribiżew Thanks for the tips. I added the sample code. I just want to print the sequences matching the regex in a file. I get the desired output with the two alternative regex sections but not with the main one. – zmerr Sep 06 '21 at 09:15
  • `"([a-z]|_)+\\.([a-z]|_)+\\R\\s*(ADD CONSTRAINT)\\s*([a-z]|_)+\\s*PRIMARY KEY”.r` didn’t work either. – zmerr Sep 06 '21 at 09:16
  • 1
    See [this demo](https://rextester.com/NBQ55700). Your problem is that you read the file line by line (see `for (line <- fSource.getLines)`). You need to grab the contents as a single string to be able to match across line breaks. – Wiktor Stribiżew Sep 06 '21 at 09:23
  • @WiktorStribiżew I am convinced this is a bug: `"([a-z]|_)+\\.([a-z]|_)+\\s+A”.r` does match, while `"([a-z]|_)+\\.([a-z]|_)+\\s+ADD”.r` does not. The difference is just adding the `DD`. – zmerr Sep 06 '21 at 09:23
  • @WiktorStribiżew Thank you very much. Sorry. that was a very dumb mistake. – zmerr Sep 06 '21 at 09:24
  • @WiktorStribiżew but what does explain the `DD` phenomenon? – zmerr Sep 06 '21 at 09:25
  • Sorry, that is not possible. If `ADD` matches, `A` must match, too. This is common logic. You are using `findAllIn`, this searches for partial matches, thus it just can't do what you say. Either both match, or none. – Wiktor Stribiżew Sep 06 '21 at 09:27
  • @WiktorStribiżew I mean `A` matches, but `ADD` does not. Logically what you are stating is correct. that’s why I think this behavior can only be caused by a bug. – zmerr Sep 06 '21 at 09:33
  • It might not be a bug. Your pattern is convoluted, `([a-z]|_)+` must be written as `[a-z_]+`. Try replacing them and check. – Wiktor Stribiżew Sep 06 '21 at 09:39
  • @WiktorStribiżew Thanks. I checked with `"""[a-z_]+(\.[a-z_]+)\s*A""”.r` and `"""[a-z_]+(\.[a-z_]+)\s*ADD""".r` same thing with `readLine`. Yet, when I just call `.mkString` on the `fSource` and pass it to the `findAllIn`, everything works as expected. – zmerr Sep 06 '21 at 09:45

1 Answers1

1

Your problem is that you read the file line by line (see for (line <- fSource.getLines) code part).

You need to grab the contents as a single string to be able to match across line breaks.

val fSource = Source.fromFile(fname).mkString
val matchExps = primaryKeyConstraintNameCatchingRegex.findAllIn(fSource)

Now, fSource will contain the whole text file contents as one string and matchExps will contain all found matches.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563