I have a library mylib
that I want to get the current git hash for logging purposes when I run an Airflow Worker via the PythonOperator
, I know several methods to get the latest git hash, the main issue is I don't know where the directory will be and I'll likely be running it out of directory.
The workers themselves can all vary (docker, ec2, gke, kubenetes) but the source python library mylib
would always be installed via a pip install git+https://${GITHUB_TOKEN}@github.com/user/mylib.git@{version}
command. Is there a generic way I can get the git hash for mylib
on any Airflow worker since the directory where mylib
is installed will change across my Airflow Workers?
Asked
Active
Viewed 156 times
1

pyCthon
- 11,746
- 20
- 73
- 135
-
Unless you know where the Git repository is or have saved the commit hash ID somewhere, no, you can't do that. Your best bet is *probably* to make release versions that store the version information you wish to report in your logs. – torek Dec 30 '22 at 02:36
1 Answers
0
When you install a package from git, you can get the git hash by calling pip freeze
, for example:
$ pip install git+https://github.com/jkbr/httpie.git#egg=httpie
$ pip freeze | grep httpie
> httpie @ git+https://github.com/jkbr/httpie.git@621042a0486ceb3afaf47a013c4f2eee4edc1a1d
And here is the commit hash on github.
So you can call the command in python and parse the result to get the hash:
import subprocess
hash = subprocess.getoutput("pip freeze | grep <your package name>").split("@")[-1]

Hussein Awala
- 4,285
- 2
- 9
- 23
-
I'll try to make my question more clear because this doesn't answer anything – pyCthon Dec 30 '22 at 01:02
-