1

I've been searching for the solution here and here but no luck, I found a thread that discussing similar case with mine and eventually I decided to ask a question here, because it does not provide a solution to the case that I face.

How can I get a certain word in Python scripts (value of params) using bash script? for example, I have a Python script which has the following code:

from datetime import datetime, timedelta
from airflow import DAG
...


args = {
    ...
}

# A DAG for my_bigquery_pipeline -> this line should not be included in bash searching.
with DAG(dag_id='my_bigquery_pipeline', default_args=args,
         schedule_interval='00 21 * * *') as dag:

from the above script I want to get the word my_bigquery_pipeline whose line is not commented on, before I ask here, I've tried it with the following way:

sed -n '/^.*dag_id\s\+\/\(\w\+\).*$/s//\1/p' bigquery_pipeline.py
// and
sed "s/dag_id//2g" bigquery_pipeline.py
// and
egrep -oP '(?<=dag_id=/)\w+' bigquery_pipeline.py

unfortunately those method doesn't work for me, any help I'll appreciate! thanks!.

Inian
  • 80,270
  • 14
  • 142
  • 161
Imam Digmi
  • 452
  • 4
  • 14

1 Answers1

1

egrep equals grep -E, so it will conflict with -P switch.
If you have GNU grep, you can do this:

grep -oP '(?<=dag_id=.)\w+' bigquery_pipeline.py

or more exact:

grep -oP '(?<=dag_id=\x27)\w+' bigquery_pipeline.py

Where 0x27 is ''s ascii code.
You can also change the outer quotes, like this:

grep -oP "(?<=dag_id=')\w+" bigquery_pipeline.py

or this more compatible with your .py code way:

 grep -oP 'dag_id\s*=\s*[\x27\x22]\K\w+' bigquery_pipeline.py

Which will also match dag_id = "my_bigquery_pipeline", and give result my_bigquery_pipeline .

And sed solution:

sed -n '/^.*dag_id *= *[[:punct:]]\([[:alnum:]_]*\).*/s//\1/p' bigquery_pipeline.py
my_bigquery_pipeline

To avoid commented lines:

grep -oP '^\s*[^#]+.*dag_id\s*=\s*[\x27\x22]\K\w+' bigquery_pipeline.py

or

sed -n '/^[^#]*dag_id *= *[[:punct:]]\([[:alnum:]_]*\).*/s//\1/p' bigquery_pipeline.py

And a perl solution for optional dag_id= and also ignore commented lines:

perl -nle 'print $& while m{[^#]*with DAG\((dag\s*=\s*)?[\x27\x22]\K\w+}g' bigquery_pipeline.py
Til
  • 5,150
  • 13
  • 26
  • 34
  • Thanks! @Tiw works like a charm, but, how I identify if the line had commented? let say I hove 2 line of code like this: ```# DAG(dag_id='bebigquery', default_args=args,``` and ```with DAG(dag_id='belajar_bigquery', default_args=args,``` how can I get only the second one? – Imam Digmi Feb 26 '19 at 06:50
  • 1
    Yes! this is my right solution from you: ```sed -n '/^[^#]*dag_id *= *[[:punct:]]\([[:alnum:]_]*\).*/s//\1/p' bigquery_pipeline.py``` thank you @Tiw you're my hero :D – Imam Digmi Feb 26 '19 at 06:55
  • Hi! @Tiw, I have a new problem here (sorry if I bothering you), actually the keyword `dag_id=` is optional, there is a case where keyword `dag_id=` is not included, how can I make optional pattern? I know, I can do this with ```sed -n '/^[^#]*DAG *( *[[:punct:]]\([[:alnum:]_]*\).*/s//\1/p' bigquery_pipeline.py```, but is there a simple way to do this? thanks – Imam Digmi Feb 26 '19 at 07:25
  • 1
    @ImamDigmi `grep -oP '[^#]*with DAG\((dag\s*=\s*)?[\x27\x22]\K\w+' bigquery_pipeline.py` – Til Feb 26 '19 at 07:34
  • Thank you very much for your help @Tiw, I'm trying to find an alternative of `-P`, because the grep command on OSX doesn't support that option. – Imam Digmi Feb 26 '19 at 07:45
  • @ImamDigmi `brew install grep --with-default-names` , from [this question](https://stackoverflow.com/questions/33231370/installed-gnu-grep-on-osx-but-cant-use). – Til Feb 26 '19 at 08:01
  • 1
    yes, OSX has perl preinstalled, thanks Tiw, this is final solution from you, many many thanks for you Tiw! – Imam Digmi Feb 26 '19 at 11:39