Bash: How to search a string in file with Regex and get the associated value

Question

I have a file with some patterns and a corresponding value for each pattern(Regex) in the following way:

path                  group
/hello/get/**         @group1
/hey/get/you          @group2
/hi/get/ping_*.js     @group3
/hello/get/**         @group4

I want to get the corresponding group value for the path I have given. For example if I give "/hello/get/book.js" I should get @group1.

How can I do that?

I have tried searching for the Regex, but I am not sure how to fetch the corresponding group from the file. Also, the grep returns the matching line if there is an exact match but not the Regex match. For example, when I give

grep '/hey/get/you' FILENAME

I get the following output: /hey/get/you @group2

But, if I give the following:

grep '/hello/get/hello.js'

it doesn't return anything.

The expected result for the string '/hello/get/hello.js' should be @group1, @group4

These aren't regular expressions, they are shell patterns, and judging by `**`, they're of the "extended" variety. — Benjamin W., Apr 03 '19 at 18:29
May be I need to remove the trailing * and make it a single * to match the pattern. — Sri, Apr 03 '19 at 18:59

Gilles Quénot · Answer 1 · 2019-04-03T19:14:20.920

1

It's not regex, but extended globs, to be enabled with

shopt -s globstar

An implementation to use this extented globs to find the file /tmp/test/hello/get/hello.js :

awk -F/ 'BEGIN{OFS="/"}NR>1{$(NF)=""; print}' /tmp/file |
    xargs -I% -n1 mkdir -p /tmp/test/%

tree

$ tree /tmp/test
/tmp/test
├── hello
│   └── get
├── hey
│   └── get
└── hi
    └── get

creating the file

touch /tmp/test/hello/get/hello.js

extented dynamic glob matching

$ awk 'NR>1{print $1, $2}' /tmp/file |
    while read r x; do
        stat /tmp/test$r &>/dev/null && echo $x
    done

output

@group1
@group4

doc

man 7 glob
globstar

edited Apr 03 '19 at 19:14

answered Apr 03 '19 at 18:59

Gilles Quénot

173,512
41
224
223

Do I need to create the file for this? Can it be a single AWK to find the match? The reason is my file has more than 5k lines with these patterns and I can't really create a file. Also, I will be doing this search inside a loop on an array of files. – Sri Apr 03 '19 at 19:10
Yes, you need to create file and directory structure, because it's not _regex_ but _globs_ – Gilles Quénot Apr 03 '19 at 19:11
I can't create the directory structure. How about removing the trailing * and search it as a regex? – Sri Apr 03 '19 at 19:22
Please ask a new question, you can't change the rules of a question deeply like this. Please, read [MCVE](https://stackoverflow.com/help/mcve) – Gilles Quénot Apr 03 '19 at 20:31

score 1 · Accepted Answer · answered Apr 04 '19 at 19:52

If I understand the question correctly, you want code that will read a list of pattern-group pairs from a file (say 'pattern_group_list.txt'), input a string (say from the command line), and print a string containing a comma-separated list of the groups corresponding to the patterns in the file that match it. If that is the case, try this code:

#! /bin/bash

readonly kPATTERN_GROUP_FILE=pattern_group_list.txt

input=$1

{
    read -r pattern group || exit 0    # Skip the first line (header)
    result=
    while read -r pattern group ; do
        [[ $input == $pattern ]] && result+=${result:+,}$group
    done
} <"$kPATTERN_GROUP_FILE"

printf '%s\n' "$result"

The code is not completely Shellcheck-clean because $pattern is not quoted in [[ $input == $pattern ]], but quoting it would break the code by preventing glob patterns from being matched.
It prints '@group2' when run with argument '/hey/get/you' and it prints '@group1,@group4' when run with argument '/hello/get/hello.js'.
The code will not work if patterns contain whitespace characters. You would need a different file format to support such patterns.
The last pattern-group pair in the file will be missed if the last line of the file is not terminated. See Read last line of file in bash script when reading file line by line for an explanation of the problem, and how to fix it if it is a concern for you.
If the file is empty the code exits immediately with good status. You would probably want to do something different in practical code.
The code makes no attempt to print useful error messages for a non-existent or unreadable input file. Practical code would handle such errors.
Bash is generally very slow. If you've got 5k+ patterns to match, don't expect to be able to process large numbers of files in reasonable amounts of time. I'd expect it to be bearable up to maybe 1k files. Beyond that you would really need to use a more efficient programming language.

This script works for Regex based patterns but not for extended globs as mentioned by @Giles Quenot. — Sri, Apr 04 '19 at 20:15
@Sri, `==` in `[[...]]` matches globs, not regexes. If you want it match extended globs, put `shopt -s extglob` at the start of the code. However, `**` is not an extended glob pattern. It's a special pattern for glob _expansion_. In glob and extended glob _matching_, `*` and `**` are effectively equivalent (though the `**` might slow down the matching). If you can provide an example of an input for which the code above doesn't work I'll try to fix it. — pjh, Apr 04 '19 at 20:21
I'm so sorry. Your code is what I wanted. Thank you so much. This works perfectly. — Sri, Apr 04 '19 at 20:28

Bash: How to search a string in file with Regex and get the associated value

2 Answers2

tree

creating the file

extented dynamic glob matching

output

doc