0

I have the following problem: I wrote a bash script for data analysis that works perfectly fine when I run it from the terminal. To further automate the process I wanted to use a python script that runs the bash script (using subprocess.call), changes the working directory, and reruns the script (and so on). This also worked fine when I did it on my MacBook. However, I need to do the analysis on a Linux machine and here the problem occurred. Again, running the script from the terminal worked fine but once I tried doing this with my python script it fails to find the relevant functions for the analysis. The functions are stored inside the anaconda3/bin folder. (Python does not even find other functions like "pip")

Of course, I could add the path to all the functions in the bash script but this seems very inefficient to me. So my question is: is there any better way of telling python where to look for the functions? And can you maybe explain to me why running the script from the terminal works but not when I use subprocess.call?

Here is the python script:

import subprocess
import os

path_list = ["Path1",
             "Path2"
             ]

for path in path_list:
    os.chdir(path)
    subprocess.call("Users/.../bash_script", shell=True)
  • Is `anaconda3/bin` part of the shell `$PATH` spawned by python? You could try a simple test like `subprocess.call('echo $PATH', shell=True)` and see if it's there. – Aelarion Apr 30 '21 at 04:13
  • Thank you for you response! Indeed anaconda3/bin is not part of the shell path when I run subprocess.call. Do you have any idea how I could add it there? By now I fixed the issue through adding the anaconda initialize function from the bashrc script to my data analysis script. Does not feel like the ideal solution but at least it works. – prometheus Apr 30 '21 at 08:45
  • Realistically I think your usage of the conda initialize function is a better approach. Anaconda creates that separation intentionally requiring the "init" function, and does not add its `bin` directory to your `PATH` by default because of possible naming conflicts. An alternative approach might be creating another user on the machine specifically configured for your automation (e.g. has all privileges revoked aside from what you need to do in the script), and then you could export `anaconda/bin` to that new user's `PATH`. Then it would just be a matter of `su`ing to that user in your script. – Aelarion Apr 30 '21 at 12:28
  • Sorry for the double comment (going over character limit). The Unix & Linux site might be a better place to ask some of these questions, since you're specifically dealing with some Linux challenges (not necessarily a "problem" per se, but definitely an interesting configuration challenge): https://unix.stackexchange.com – Aelarion Apr 30 '21 at 12:31

1 Answers1

1

I'm just posting my series of comments as an answer since I think this at least constitutes a reasonable answer for anyone running into a similar issue (your question could definitely be common enough to index from search engine results).

Issue:

...running the script from the terminal worked fine but once I tried doing this with my python script it fails to find the relevant functions for the analysis

In general, you can troubleshoot this kind of problem with:

import subprocess
subprocess.call('echo $PATH', shell=True)

If the directory that contains the relevant binaries/scripts/etc. is not in the output, then you are facing a PATH issue in the shell created by subprocess.call.

The exact problem as confirmed by the OP in comments is that anaconda3/bin is not part of your PATH. Your script works in a regular terminal session because of the Anaconda initialization function that gets added to your .bashrc when installing.

Part of an answer that is very helpful here: Python - Activate conda env through shell script

The problem with your script, though, lies in the fact that the .bashrc is not sourced by the subshell that runs shell scripts (see this answer for more info). This means that even though your non-login interactive shell sees the conda commands, your non-interactive script subshells won't - no matter how many times you call conda init.

Solution 1: Manually use the Anaconda sourcing function in your script

As the OP mentioned in the comments, their workaround was to use the initialization function added to their .bashrc in the script they are trying to run. Although this perhaps feels like not a great solution, this is a "good enough" workaround. Unfortunately I don't use Anaconda on Linux so I don't have an exact snippet of what this looks like. See the next section for a possibly "cleaner" solution.

Solution 2: Use bash -i to run your script

As mentioned in the same answer linked above, you might be able to use:

bash -i Users/.../bash_script

This will tell bash to run in interactive mode, which then properly sources your .bashrc file when creating the shell. As a result, Anaconda and related functions should work properly.

Solution 3: Manually add anaconda3/bin to PATH

You can check out this answer to decide if this is something you want to do. Keep in mind they are speaking about a Windows OS but most of the same applies to Linux.

When you add the directory to your PATH, you are specifically telling your system to always look in that directory for commands when executing by name, e.g. ping or which. This can have unexpected behavior if you have conflicts (e.g. a command is found with the same name in /usr/bin and .../anaconda3/bin), and as such Anaconda does not add its bin folder to your PATH by default.

This is not necessarily "dangerous" per se, it's just not an ideal solution for most people. However, you are the boss of your own system. If you decide this works for your particular workflow, you can just add the export to your script:

export PATH="path/to/anaconda3/bin:$PATH"

This will set the PATH for use in the current shell and sub-processes.

Solution 4: Manually source the conda script (possibly outdated)

As mentioned in this answer, you can also opt to manually source the conda.sh script (keep in mind your conda.sh might be in another directory):

source /opt/anaconda/etc/profile.d/conda.sh

This will essentially run that shell script and add the included functionality to the current shell (e.g the one spawned by subprocess.call).

Keep in mind this answer is quite a bit older (~2013) and may not apply anymore, depending how much conda has changed over the years.


Notes

As I mentioned in the comments, you may want to post some related questions on https://unix.stackexchange.com/. You have an interesting configuration challenge that may be better suited for answers specifically pertaining to Linux, since your issue is sourcing directly from Linux shell behavior.

Aelarion
  • 397
  • 4
  • 11
  • Wow, thank you very much for your comprehensive reply! That was really helpful! – prometheus May 03 '21 at 06:29
  • I would love to upvote your solution but apparently, I don't have enough "reputation" for doing so. Sorry for this. – prometheus May 03 '21 at 06:31
  • @prometheus not a problem, glad to help! If this solved your question you can accept the answer. That will help if other people stumble across this question in searches. – Aelarion May 03 '21 at 14:41