1

I try to run a script to extract data from a json file which is updated every 1 to 2 minutes. The basic concept is that the script execute the extraction procedure first and then sleep for 1 minutes and execute the extraction procedure again. it is infinite loop;

It worked fine for more than one month and stopped suddenly one day without any error message, I restarted it and it worked fine. However, after some days it stopped again for no reason.

I have no idea what's the problem and could just provide my script. below is the python file I wrote.

from requests.auth import HTTPBasicAuth
    import sys
    import requests
    import re
    import time
    import datetime
    import json

    from CSVFileGen1 import csv_files_generator1
    from CSVFileGen2 import csv_files_generator2
    from CSVFileGen3 import csv_files_generator3
    from CSVFileGen4 import csv_files_generator4

    def passpara():
            current_time = datetime.datetime.now()
            current_time_string = current_time.strftime('%Y-%m-%d %H:%M:%S')
            sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
            FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
            FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
            FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
            FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'
            try:
                    r1 = requests.get('https://www...=JSON')
                    json_text_no_lines1 = r1.text
                    csv_files_generator1(current_time, json_text_no_lines1, FileLocation1)
            except requests.exceptions.RequestException as e:
                    print 'request1 error'
                    print e
            try:
                    r2 = requests.get('https://www...=JSON')
                    json_text_no_lines2 = r2.text
                    csv_files_generator2(current_time, json_text_no_lines2, FileLocation2)
            except requests.exceptions.RequestException as e:
                    print 'request2 error'
                    print e
            try:
                    r3 = requests.get('https://www...=JSON')
                    json_text_no_lines3 = r3.text
                    csv_files_generator3(current_time, json_text_no_lines3, FileLocation3)
            except requests.exceptions.RequestException as e:
                    print 'request3 error'
                    print e
            try:
                    r4 = requests.get('https://www...JSON')
                    json_text_no_lines4 = r4.text
                    csv_files_generator4(current_time, json_text_no_lines4, FileLocation4)
            except requests.exceptions.RequestException as e:
                    print 'request4 error'
                    print e
            print current_time_string + ' Data Operated. '   
    while True:
        passpara()
        time.sleep(60)

Here is the CSVFileGen1 that the first script calls. This script parses the json file and saves the information to a csv file.

import json
import datetime
import time
import os.path
import sys
from datetime import datetime
from dateutil import tz


def meter_per_second_2_mile_per_hour(input_meter_per_second):
    return input_meter_per_second * 2.23694

def csv_files_generator1(input_datetime, input_string, target_directory):

        try:
                real_json = json.loads(input_string)
                #get updatetime string
                updatetime_epoch = real_json['updateTime']
                update_time = datetime.fromtimestamp(updatetime_epoch/1000)
                updatetime_string = update_time.strftime('%Y%m%d%H%M%S')
                file_name = update_time.strftime('%Y%m%d%H%M')
                dir_name = update_time.strftime('%Y%m%d')
                if not os.path.exists(target_directory + '\\' + dir_name):
                    os.makedirs(target_directory + '\\' + dir_name)
                if not os.path.isfile(target_directory + '\\' + dir_name + '\\' + file_name):
                        ......#some detailed information I delete it for simplicity
        except ValueError, e:
                print e
tdube
  • 2,453
  • 2
  • 16
  • 25
Yuandong
  • 57
  • 9

2 Answers2

1

At first glance, I think it would be the sys.path becoming full (as litelite mentioned). I think you can safely move this block of code outside the function to prevent it from being run infinitely (only append to sys.path once):

sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'

So, your code would look like:

sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'
while True:
    passpara()
    time.sleep(60)

When I tried a program that infinitely appends to sys.path, my RAM was being used very heavily. You may want to look into the memory usage of your script as the Python script may be hanging since it doesn't have enough memory. After a few minutes of running this script, my Chrome window crashed due to Python using around 10 GB RAM (used all available RAM).

Please note that I did not have a time.sleep(). The results obtained after running it without any pauses for a few minutes might reflect those found when running it every 60 seconds for a month.

My program is as follows:

import sys
while True:
    sys.path.append("C:\\semester3\\data_copy\\WAZE\\output_scri‌​pts\\TNtool")

Interesting note: A simple incrementing of a variable in a while loop does not rapidly use large amounts of RAM. This is mainly since the variable in question is being overwritten each time and does not take up extra memory. In your case, sys.path is a "list" and appending to it infinitely causes extra RAM to be used. Example program:

count = 0
while True:
    count += 1

On the other hand, appending to a list heavily uses RAM, which is to be expected:

count = []
while True:
    count.append(1)
Advait
  • 181
  • 6
  • Right, I run my code for a while and notice that include sys.path.append in the while loop requires extra memory and it seems increasing slowly. I never though this would be a problem because the computer running this code do have 128G RAM, and most of the time the memory occupation is around 10 to 20 G. But since the code is ran for a month, this could be the cause. thanks! – Yuandong Aug 22 '17 at 20:41
  • I just realize that I have another code, that wrote in the same style(sys.path.append included in the while loop) to parse the same json file for different usage and run in the same computer, while the previous one stopped two month ago, this one kept running till now. So I am thinking that the RAM is not a problem. – Yuandong Aug 22 '17 at 20:57
  • Hmmm, that's weird. Without any logs/error messages it's pretty hard to hypothesize the cause of the problem. Could you redirect the script's stdout (standard output) and stderr (standard error) to a file (see https://stackoverflow.com/questions/4675728/redirect-stdout-to-a-file-in-python for details)? A Python script generally shouldn't exit silently. However, I agree with cddt's suggestion of using a scheduler for this task (this may solve the issue you are currently seeing). – Advait Aug 23 '17 at 14:24
0

I believe that your question has already been answered regarding the reason behind why your script may fail, so I won't duplicate that answer.

However I will provide an alternative solution. Instead of having your script run for days on end, remove the infinite loop, and set it up to run every minute with task scheduler (Windows) or cron (Linux). This has a couple of immediate benefits:

  1. memory is cleared after each run;
  2. recovery from an unexpected error can happen in 60 seconds, rather than when you see the script has stopped running.
cddt
  • 539
  • 5
  • 14