-2

I've never created, nor used a cron job before, but what I've gathered from numerous questions and answers on SO is that the process is fairly simple and involves something like the following:

  1. Create bash file with shell commands
  2. Edit crontab

I've found lots of questions and answers on SO regarding cron jobs, but not a single one of them actually explains the syntax. I've tried looking online for a reliable explanation too, but to no avail. I did find this page, however, which explains the time and date portion of crontab statements very clearly.

Here's my understanding so far:

1. Create bash script, which can be placed anywhere.

#!/bin/bash
cd /home/user/public_html/scrapy/projects/myproject/spiders
scrapy crawl mycrawler
  • What is the significance of the #!/usr/bin/bash statement?

  • Why is it commented out?

  • Is using a shell script as a proxy even necessary to run Python scripts?

2. Edit crontab via the crontab -e command

I've seen so many different recommendations for this part, so I'm going to list a few examples from a few different answers.


Example #1

PATH=/usr/bin
* 5 * * * cd project_folder/project_name/ && scrapy crawl spider_name
  • Is embedding commands directly in crontab -e considered good practice?

Example #2

*/5 * * * * /usr/local/bin/python /home/Documents/SCRAPE_PYTHON/SCRAPE.py &>> /home/Desktop/log.txt

  • What is the significance of the first path, /usr/local/bin/python, in this context?

He states in his answer that &>> /home/Desktop/log.txt is the file to which errors and other output will be appended.

  • Is that what the &>> does?

  • Is that universal for every single Linux environment?


Example #3

*/2 * * * * /home/user/shell_scripts/cj-scrapy.sh

  • How come the above code does not include two paths?

  • Is it a potential security vulnerability to place shell scripts in the /home/user/scripts directory?

  • Is there a specific directory where shell scripts like this are commonly stored?


Example #4

The cPanel Cron Job Wizard recommends the following syntax:

/usr/local/bin/php /home/user/public_html/path/to/cron/script


Why all of the discrepancies between crontab recommendations?

I understand the syntax of the time and date portion of crontab, but can somebody please explain the proper syntax for the rest of it?

oldboy
  • 5,729
  • 6
  • 38
  • 86
  • Can somebody please explain why you're voting to close the question? – oldboy Jul 02 '18 at 02:34
  • 4
    The close reason is "Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once." This is 10 questions in one post, plus a very broad "can someone explain shell syntax", neither of which doesn't work well with StackOverflow's format. Split it up into single questions (and preferably Google each one first, since things like "what is #!/usr/bin/bash" are easily answered) – that other guy Jul 02 '18 at 02:40
  • 1
    Too many different questions here. It's not possible to provide one clear and concise answer. Flagged for reason "too broad". – Miguel Ortiz Jul 02 '18 at 02:48
  • @MiguelOrtiz Thanks for the explanation, though I've seen many questions like this elsewhere on SO? – oldboy Jul 02 '18 at 02:51

3 Answers3

0

Many questions here BUT:

Cron job or cron schedule is a specific set of execution instructions specifying day, time and command to execute. crontab can have multiple execution statements. And each execution statement can have many commands (i.e. per line).

What is the significance of the #!/usr/bin/bash statement?

It is a shebang. If a script is named with the path path/to/script, and it starts with the shebang line, #!/usr/bin/bash, then the program loader is instructed to run the program /usr/bin/bash and pass it the path/to/script as the first arg.

Why is it commented out?

In computing, a shebang is the character sequence consisting of the characters number sign and exclamation mark (#!) at the beginning of a script.

Is using a shell script as a proxy even necessary to run Python scripts?

In relation to the crontab? No. You can pass many commands

* * * * * /usr/bin/python script.py

Editing crontab by crontab -e. Simple answer, yes. Here is a very quick reference:

crontab -e    Edit crontab file, or create one if it doesn’t already exist.
crontab -l    crontab list of cronjobs , display crontab file contents.
crontab -r    Remove your crontab file.
crontab -v    Display the last time you edited your crontab file. (This option is only available on a few systems.)

Example 2 You are telling cron to execute a python script. Cron needs to know where the python binary is (at /usr/local/bin/python), which is required to execute the python script sitting at /home/Documents/SCRAPE_PYTHON/SCRAPE.py (the &>> is for directing output to a log file).

Jesse
  • 1,814
  • 1
  • 21
  • 25
0

I try to provide context for all of your questions and examples below, but ultimately the question is:

  1. How frequently do you want to execute your command?
  2. What command do you need to execute?

Generally a crontab entry is a time directive, followed by a shell command:

* * * * * shell command
^ ^ ^ ^ ^|^^^^^^^^^^^^^
| | | | |||||||||||||||
   time  |shell command

In UNIX, #! (or shebang) indicates which program should be used to interpret the script that follows. So #!/usr/local/bin/python means, execute the following script with python (aka /usr/local/bin/python), just as /bin/bash indicates that the following script should be executed with the bash shell. This looks like a comment because it is a comment ... it is designed to be a comment to python so it's not interpreted, but it has meaning to UNIX when executing (similar to a preprocessor directive).

This shebang answers your question:

Is using a shell script as a proxy even necessary to run Python scripts?

The answer is no. The shebang makes this wrapper completely unnecessary.

Now to get to your examples:

Example 1:

PATH=/usr/bin
* 5 * * * cd project_folder/project_name/ && scrapy crawl spider_name

Breaking this down. * 5 * * * indicates that this command should be run on every minute (first *) of the 5th hour (5) of every day of the month (next *) of every month (next *) on every day of the week (last *). This is almost certainly not what you want from a time perspective. The rest of the line is executed as a command string, so you are changing directory to project_folder/project_name and then executing scapy. Overall the crontab bits here are not what you want, and the relative path on the cd indicates that this command is also probably not correct.

Is embedding commands directly in crontab -e considered good practice?

It is what crontab is time directives followed by a command.

Example #2

*/5 * * * * /usr/local/bin/python /home/Documents/SCRAPE_PYTHON/SCRAPE.py &>> /home/Desktop/log.txt

This command will run every 5th minute of every hour of every day of every month on every day of the week. The /usr/local/bin/python here is redundant with the #!/usr/local/bin/python so it is quite unnecessary.

The &>> will append the output (>>) of the command on both stdout and stderr (&) to the file /home/Desktop/log.txt. This logging is good, the every 5 minutes might be good, but the python bits are not necessary. To answer your second question: yes this isbash` syntax so it will work with every command.

Example #3

*/2 * * * * /home/user/shell_scripts/cj-scrapy.sh

This executes the program /home/user/shell_scripts/cj-scraph.sh (presumably a shell script) every 2nd minute of every hour of every day of the month of every month on every day of the week. This script presumably runs your python script.

Example 4

This is neither python nor a cron job.

Matthew Story
  • 3,573
  • 15
  • 26
  • I thought i understood the `shebang` part until you added the "The shebang makes this wrapper completely unnecessary." Can you explain that? – oldboy Jul 02 '18 at 02:52
  • You don't need to wrap a python script in a shell script because the python script has the shebang that says "i'm python". So you don't need a shell script (which is the presumptive scripting language for non-binary executable files) to dispatch to a python file. It's the whole point of the `shebang` ... otherwise UNIX would assume that all non-binary executables were shell. – Matthew Story Jul 02 '18 at 02:55
  • Hm... but my Python script doesn't have the shebang statement? In order to do all of this from `crontab` would the following command work? `*/2 * * * * /home/user/shell_scripts/cj-scrapy.sh` if I'm pointing to the shell script included in my question above? `*/2 * * * * /usr/bin/python /home/user/public_html/projects/spiders/mycrawler.py scrapy run crawler` if I'm running everything without the shell script? How exactly do I include that `scrapy run crawler` part in a `crontab` statement?! – oldboy Jul 02 '18 at 03:00
  • @Anthony add the shebang as the first line of `mycralwer.py` – Matthew Story Jul 02 '18 at 03:01
  • Ahhh, above `# -*- coding: utf-8 -*-`? It would seem more practical to me to keep it separate from my Python script. I'd have to add it to every script I'd want to run on a cron job. – oldboy Jul 02 '18 at 03:02
  • @Anthony yeah ... at the very top, and yes, you need to add it to every python script :). – Matthew Story Jul 02 '18 at 03:03
  • How exactly do you include the `scrapy run crawler` command in a `crontab` command statement if I don't use a shell script? – oldboy Jul 02 '18 at 03:04
  • `scrapy run crawler` is actually a shell command, so `*/2 * * * * scrapy run crawler` is a valid cron command. You want to `cd` so it dumps the results to the right directory, but you would not need a wrapper for that. – Matthew Story Jul 02 '18 at 03:05
  • Ok, so something like `*/2 * * * * cd /home/user/public_html/projects/myproject/spiders scrapy run crawler` would be entirely sufficient without the shell script? – oldboy Jul 02 '18 at 03:06
  • This is what your example #1 is doing: `*/2 * * * * cd /home/user/public_html/projects/myproject/spiders && scrapy run crawler`. This is sufficient without any wrapper. You could then add the `&>> log.txt` to the end (from example #2) and log the output to a file. – Matthew Story Jul 02 '18 at 03:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174135/discussion-between-matthew-story-and-anthony). – Matthew Story Jul 02 '18 at 03:08
  • 1
    love that carpet-bomb downvote. – Matthew Story Jul 02 '18 at 03:12
  • `*/2 * * * * cd /home/uders/public_html/projects/myproject/spiders/logs-spider && scrapy crawl indiegogo &>> /home/user/public_html/projects/spiders/logs-cron-job/mycrawler.txt` is not working :/ – oldboy Jul 02 '18 at 03:49
0

Let me answer you all of your question one by one!

1.) What is the significance of the #!/usr/local/bin/python statement in this context?

  • So, This is called Shebang that is used for specifying the interpreter here you are pointing to the python interpreter. You can prevent this but if you prevent this you have to specify interpreter while running your script.

Similarly #!/bin/bash to point out bash interpreter. & shebang necessarily to be the first line of script.

2) Is creating a shell file necessary to run Python/Scrapy scripts, et cetera?

  • Nup you can directly enter commands in crontab.

3) Can you actually export PATH and execute other commands (i.e. cd ... && scrapy crawl mycrawler) directly in crontab -e?

  • If you will do so it will be a good practice and also it reduces the line because if you not use PATH then you have to specify every time the full path of a command to be executed like /usr/bin/find to just run find command.

4) What is the significance of the first path, /usr/local/bin/python?

  • as i said earlier it is pointing to your interpreter.

5) In his answer, he states that &>> /home/Desktop/log.txt is the file to which errors and other output will appended. Is that what the &>> does? Is that universal for every single Linux distro?

  • &>> will apped output to the file that is stdout Yea it's common for all linux distro.

6) Is there a specific location on servers where shell scripts like this are commonly stored?

  • Well It's up to you where you store your scripts. but it's recommended that you store your scripts at a safe place like in /opt/<user>/scripts folder.

Optional

https://crontab.guru/ - here you can understand more about crontab syntax and all other things are basic Linux things.