Run Regex using Grep/Sed recursively over files to store capture group

Question

I have a file structure that looks like this:

Folder1

file1.feature

file2.feature

file3.feature

Folder2

file1.feature

file2.feature

...etc.

The files are Behat feature files which look like this:

Scenario: I am filling out a form
    Given I am logged in as User
    And I fill in "Name" with "My name"
    Then I fill in "Email" with "myemail@example.com"

I am trying to iterate over each file within the file structure to get matches on my regex:

/I fill in "[^"]+" with "([^"]+)"/gm

The regex looks for I fill in "x" with "y", and I would like to store the capture group "y" from each file where a line in the file matches the expression.

So far I can iterate through the folders and print out the file names in mt Bash script like so:

#!/bin/bash

cd behat/features

files="*/*.feature"


for f in $files
do
    echo ${f}
done

I am trying to retrieve the capture group using Sed currently by doing this in my loop:

sed -r 's/^I fill in \"[^)]+\" with \"([^)]+)\"$/\1/'

But I fear that I am going down the wrong track, as this is returning all of the file content throughout all the files.

Try `sed -E -n 's/.*I fill in "[^"]+" with "([^"]+)"/\1/p'`, see [this demo](https://ideone.com/wqAJ4r) — Wiktor Stribiżew, Jun 11 '19 at 11:18
@WiktorStribiżew sweet, that works on when I run it on an individual file, do you know how I could incorporate this into my script please? — party-ring, Jun 11 '19 at 11:27
`cd behat/features && find . -name *.feature -type f -print0 | xargs -0 sed -E -n 's/.*I fill in "[^"]+" with "([^"]+)"/\1/p' > outfile`? — Wiktor Stribiżew, Jun 11 '19 at 11:29
@WiktorStribiżew dude, thank you!! Post it as an answer and I can accept. — party-ring, Jun 11 '19 at 11:33

score 2 · Accepted Answer · answered Jun 11 '19 at 11:35

You may use

cd behat/features && find . -name *.feature -type f -print0 | xargs -0 \
  sed -E -n 's/.*I fill in "[^"]+" with "([^"]+)"/\1/p' > outfile

This command "goes" to behat/features directory, finds all files with feature extension (recursively) and then prints the capture group #1 values matched with your regex as -n option suppresses the output of lines and p flag only outputs what remains after a replacement.

See more specific solutions for recursive file matching at How to do a recursive find/replace of a string with awk or sed? if need be.

Run Regex using Grep/Sed recursively over files to store capture group

1 Answers1