0

I have found a couple of related threads: Regular expression - match all words but match unique words only once and get unique regex matcher results (without using maps or lists) there are a few others but I just could not get their solutions to solve my issues.

I've been reading on looharounds and backreferences but I'm still missing something.

I need to search through several large code-bases, and find all unique occurrences of data source names or variables for them.

I tried the following regular expressions:

(datasource=\"(.*?)\")(?!.+\1)
(datasource=\"(.*?)\")(?!.*\1)
(datasource=\"(.*?)\")(?!.+\2)
(datasource=\"(.*?)\")(?!.*\2)
(datasource=\"(.*?)(?!.+\1)\")
(datasource=\"(.*?)(?!.*\1)\")
(datasource=\"(.*?)(?!.+\2)\")
(datasource=\"(.*?)(?!.*\2)\")

datasource="someDSN"
datasource="anotherDNS"
datasource = "anotherDNS"
datasource="someDSN"

The code can be complex, but basically it looks something like this:\

  <cfquery name="qry_getEvent" datasource="#APPLICATION.firstDSN#">
    SELECT * 
    FROM events
    WHERE id = 1
  </cfquery>
  
    <cfquery name="qry_getPlayers" datasource="#APPLICATION.firstDSN#">
    SELECT * 
    FROM players
    WHERE event_id = 1
  </cfquery>
  
    <cfquery name="qry_getLocation" datasource="secondDSN">
    SELECT * 
    FROM locations
    WHERE event_id = 1
  </cfquery>

The result should look something like:

#APPLICATION.firstDSN#
secondDSN

The only semi-solution I've discovered is to run the (datasource=\"([^"]*)\") multiple times, but after every time, prefix it with a known value to exclude it for example:

(?!datasource="dsnname1"|datasource="dsnname2")(datasource=\"([^"]*)\")

This helped me narrow down all the DSN names in a few minutes, but would have been so much easier if I could just get all the distinct results automatically. Maybe this need a little Node.js work added to it to streamline the process

pixelwiz
  • 623
  • 1
  • 11
  • 20
  • It must be unique because you're afraid you'll match the pattern multiple times per line? Would not just be simpler to match and use the first datasource found that meets the `datasource="something"` criteria? – Neil Oct 03 '17 at 13:27
  • is there any reason you can't just put the matches in a set? – Smern Oct 03 '17 at 13:27
  • 1
    Look [here](https://regex101.com/r/ozFNPV/1). Do not use it. Using a single regex to search huge amounts of text data like this is not the best idea. Better match all the strings and then det distinct data using programmatic means. – Wiktor Stribiżew Oct 03 '17 at 13:29
  • @WiktorStribiżew I think it should be `(datasource\s*=\s*\"([^"]*)\")(?!.+\2)` since you want to ensure the contents are not duplicated, not the whole `datasource` attribute. The OP also presented a couple strings with spaces before/after the `=`, so I've added that to the expression – ctwheels Oct 03 '17 at 13:35
  • @Neil, I just wanted a quick way to see all the different data sources that are being referenced. I guess I could write this in Node and have it run against Git repos, but that'd be a lot more work than just putting a regex in VS Code or Dreamweaver and doing a search. – pixelwiz Oct 03 '17 at 14:26
  • @Wiktor, thanks for that solution. I understand why it should not be used. But it seems to work fine in that link but when I try it in Espresso or Dreamweaver I still get all the results coming back. Odd. – pixelwiz Oct 03 '17 at 14:26
  • Yes, you do not use `(?s)` – Wiktor Stribiżew Oct 03 '17 at 14:48

0 Answers0