2

A python requirements.txt file is invalid if it has different versions for the same package, represented as the lines bellow (the file is assumed to be sorted):

agate==1.6.0
agate==1.7.0

I'm trying to write a regex to detect duplicated packages (not lines as versions can differ). My capturing group is is represented by ^([^=]+)==.+$. Removing duplicated lines is close to the solution as it uses a back reference for last line, but my back reference would be only for the capturing group, not for the whole line.

Rafael Borja
  • 4,487
  • 7
  • 29
  • 33

2 Answers2

3

Detect these strings with

(?sm)^([^=]+)==.*\n\1==

See proof.

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the line
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  ==                       '=='
--------------------------------------------------------------------------------
  .*                       any character (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \n                       '\n' (newline)
--------------------------------------------------------------------------------
  \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  ==                       '=='

Python:

import re
regex = r"^([^=]+)==.*\n\1=="
test_str = "agate==1.6.0\nagate==1.7.0"
containsDupe = bool(re.search(regex, test_str, re.MULTILINE | re.DOTALL))
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • Changing `\n` to `^` saves a step as it doesn't need to backtrack the `\n`. You can also probably save one extra step by removing the last `=`. It's technically redundant since you're using `[^=]`. `^([^=]+)==.*\1^=` – ctwheels Aug 13 '20 at 20:13
  • Thank you for the detailed explanation! – Rafael Borja Aug 13 '20 at 20:18
  • 1
    Typo in my previous comment (can't edit past 5 mins): `^([^=]+)==.*^\1=` – ctwheels Aug 13 '20 at 20:19
-1

The following regex worked in a sorted file using Notepad++ replace feature

REGEX ^(.*)(==[\d\.]+)(\r?\n\1)==[\d\.]+$ REPLACE: $1$2

How it works:

The first part of the regex capture two groups

  1. ^(.*) : Package name (star of line then any character up to == (on the second group)
  2. (==[\d\.]+): Equal signs and package version (any combination of digits and dots after equal signs)

The second part of the regex matches the following package name on next line with 1st group captured, where

  • (\r?\n\1) : Matches the first capturing group
  • ==[\d\.]+$: Matches the second line package name

The replace uses $1$2 where

  • $1: First capturing group with package name
  • $2: Second capturing group with equal signs package version
Rafael Borja
  • 4,487
  • 7
  • 29
  • 33