0

I'm writing a script in bash where I use the grep function with a regex expression to extract an id which I will be using as a variable.

The goal is to extract all characters until it finds /, but the caracter ' and } should be ignored.

file.txt:

{'name': 'projects/data/locations/us-central1/datasets/dataset/source1/messages/B0g2_e8gG_xaZzpbliWvjlShnVdRNEw='}

command:

cat file.txt | grep -oP "[/]+^"

The current command isn't working.

desired output:

B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=
Peter
  • 544
  • 5
  • 20
  • 1
    This is very classically the [XY Problem](https://xyproblem.info/) (example 1). – Rogue Aug 12 '22 at 20:09
  • Hi @jhnc, I updated the question with expected output – Peter Aug 12 '22 at 20:15
  • So you want to ignore `}` as well ? – jhnc Aug 12 '22 at 20:18
  • yes, you're right. I didn't realized about `}` – Peter Aug 12 '22 at 20:23
  • 1
    `"[^/]+(?='})"` or `"(?<=/)[^/]+(?=')"` – jhnc Aug 12 '22 at 20:24
  • it worked! thanks @jhnc would you post it, I will mark as the answer or give some detail on what each expression is doing and I can add as answer here thanks a lot – Peter Aug 12 '22 at 20:26
  • 2
    [Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/) :-). – Ed Morton Aug 12 '22 at 23:13

6 Answers6

1

The regex you gave was: [/]+^

It has a few mistakes:

  • Your use of ^ at the end seems to imply you think you can ask the software to search backwards - You can't;
  • [/] matches only the slash character.

Your sample shows what appears to be a malformed JSON object containing a key-value pair, each enclosed in single-quotes. JSON requires double-quotes so perhaps it is not JSON.

If several assumptions are made, it is possible to extract the section of the input that you seem to want:

  • file contains a single line; and
  • key and value are strings surrounded by single-quote; and
  • either:
    • the value part is immediately followed by }; or
    • the name part cannot contain /

You are using -P option to grep, so lookaround operators are available.

(?<=/)[^/]+(?=')
  • lookbehind declares match is preceded by /
  • one or more non-slash (the match)
  • lookahead declares match is followed by '
[^/]+(?='})
  • one or more non-slash (the match)
  • lookahead declares match is followed by ' then }

Note that the match begins as early in the line as possible and with greedy + it is as long as possible.

jhnc
  • 11,310
  • 1
  • 9
  • 26
1

Using any awk:

$ awk -F"[/']" '{print $(NF-1)}' file.txt
B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

Basic parameter parsing.

$: x="$(<file.txt)"            # file contents in x
$: x="${x##*/}"                # strip to last / to get rid of 'name'
$: x="${x//[^[:alnum:]=]}"     # strip not alphanumeric or = to clean the end
$: echo "$x"
B0g2e8gGxaZzpbliWvjlShnVdRNEw=
Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
1

If the data structure is always like that and you can use jq, translate the single quotes to double quotes, take the name property and then the last values after splitting on /

tr "'" '"' < file | jq -r '.name | split("/") | last'

Output

B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1

With jq you could try following code. Firstly change all occurrences of ' to " in json to make it valid one by using tr command(as per your shown samples), then we can use jq command's sub function to get the required output.

jq -r '.[] | sub(".*/";"")' <(tr "'" '"' < Input_file)

OR you want to look for specifically name element then try following:

 jq -r '.name | sub(".*/";"")' <(tr "'" '"' < Input_file)
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0
 echo "${inputdata}"| 
mawk ++NF OFS= FS='^.+/|[}\47]+$'     
B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
  • 1
    Code is a lot more helpful when it is accompanied by an explanation. Stack Overflow is about learning, not providing snippets to blindly copy and paste. Please [edit] your answer and explain how it answers the specific question being asked. See [answer]. – ChrisGPT was on strike Aug 14 '22 at 12:12
  • @Chris : like u said - learning - learning is a 2-way street that requires both sides to be active participants of the conversation instead of being a 1-way street. My answer is very much the same approach as the other ones, in a more simplified manner. Funny you didn't criticize Ed Morton's solution that also came with zero amount of explanation. – RARE Kpop Manifesto Aug 15 '22 at 06:13
  • This answer was in the "low quality answers" review queue, which is where I saw it. I stand by my comment: it isn't very clear, and it could benefit from some explanation. Your misuse of Markdown is also confusing. What's with the blockquote? Note that at least one other user agrees with me, based on my upvoted comment. I suggest you take feedback with an open mind instead of being defensive. – ChrisGPT was on strike Aug 15 '22 at 10:29
  • @Chris : and have you actually thought about why Stack Network sites have an unfortunate propensity of attracting leeches the way lampposts attract moths? – RARE Kpop Manifesto Aug 15 '22 at 10:39
  • I don't understand why you're being so hostile. [You](https://stackoverflow.com/a/73357622/354577) [clearly](https://stackoverflow.com/a/73347796/354577) [know](https://stackoverflow.com/a/73344475/354577) [how](https://stackoverflow.com/a/73344397/354577) [to](https://stackoverflow.com/a/73327093/354577) [explain](https://stackoverflow.com/a/73319722/354577) [answers](https://stackoverflow.com/a/73318420/354577). I'm just asking you to do the same here. Again, this answer is in the "low quality answers" queue. I'm just trying to help. – ChrisGPT was on strike Aug 15 '22 at 10:45
  • @Chris : if u think it's low quality then so be it. – RARE Kpop Manifesto Aug 15 '22 at 10:49