0

So basically, I'm creating a directory that allows users to put csv files in there. But I want to create python script that would look in that folder everyday at a given time (lets say noon) and pick up the latest file that was placed in there if it's not over a day old. But I'm not sure if that's possible.

Its this chunk of code that I would like to run if it the app finds a new file in the desired directory:

def better_Match(results, best_percent = "Will need to get the match %"):
    result = {}
    result_list = [{item.name:item.text for item in result} for result in results]
    if result_list:
        score_list = [float(item['score']) for item in result_list]
        match_index = max(enumerate(score_list),key=lambda x: x[1])[0]
        logger.debug('MRCs:{}, Chosen MRC:{}'.format(score_list,score_list[match_index]))
        logger.debug(result_list[match_index])
        above_threshold = float(result_list[match_index]['score']) >= float(best_percent)
        if above_threshold:
            result = result_list[match_index]
    return result

def clean_plate_code(platecode):
    return str(platecode).lstrip('0').zfill(5)[:5]

def re_ch(file_path, orig_data, return_columns = ['ex_opbin']):
    list_of_chunk_files = list(file_path.glob('*.csv'))
    cb_ch = [pd.read_csv(f, sep=None, dtype=object, engine='python') for f in tqdm(list_of_chunk_files, desc='Combining ch', unit='chunk')]
    cb_ch = pd.concat(cb_ch)
    shared_columns = [column_name.replace('req_','') for column_name in cb_ch.columns if column_name.startswith('req_')]
    cb_ch.columns = cb_ch.columns.str.replace("req_", "")
    return_columns = return_columns + shared_columns
    cb_ch = cb_ch[return_columns]
    for column in shared_columns:
        cb_ch[column] = cb_ch[column].astype(str)
        orig_data[column] = orig_data[column].astype(str)
    final= orig_data.merge(cb_ch, how='left', on=shared_columns)
    return final

2 Answers2

0

For running script at certain time:

You can use cron for linux. In windows you can use windows scheduler

Here is an example for getting latest file in directory

files = os.listdir(output_folder)
files = [os.path.join(output_folder, file) for file in files]
files = [file for file in files if os.path.isfile(file)]
latest_file = max(files, key=os.path.getctime)
DD_N0p
  • 229
  • 1
  • 2
  • 6
  • this does not answer the part of OP's question where they want it to run at a particular time every day – M Z Jul 24 '20 at 17:51
0

This will do the job!

import os
import time
import threading
import pandas as pd

DIR_PATH = 'DIR_PATH_HERE'

def create_csv_file():
    # create files.csv file that will contains all the current files
    # This will run for one time only
    if not os.path.exists('files.csv'):
        list_of_files = os.listdir(DIR_PATH )
        list_of_files.append('files.csv')
        pd.DataFrame({'files':list_of_files}).to_csv('files.csv')
    else:
        None


def check_for_new_files():
    create_csv_file()
    files = pd.read_csv('files.csv')
    list_of_files = os.listdir(DIR_PATH )
    if len(files.files) != len(list_of_files):
        print('New file added')
        #do what you want
        #save your excel with the name sample.xslx
        #append your excel into list of files and get the set so you will not have the sample.xlsx twice if run again

        list_of_files.append('sample.xslx')
        list_of_files=list(set(list_of_files))

        #save again the curent list of files
        pd.DataFrame({'files':list_of_files}).to_csv('files.csv')
        print('Finished for the day!')



ticker = threading.Event()
# Run the program every 86400 seconds = 24h
while not ticker.wait(86400):
    check_for_new_files()

It basically uses threading to check for new files every 86400s which is 24h, and saves all the current files in a directory where the py file is in and checks for new files that does not exist in the csv file and append them to the files.csv file every day.

JaniniRami
  • 493
  • 3
  • 11
  • Thanks for this, I have some of the code, but I don't think I asked the second part of the question correctly. With the new csv file that in the selected directoryI would be opening it and running a different script to dump into a database(I have the code written for this). Does that make more sense? – UndefinedKid01 Jul 24 '20 at 18:28
  • Provide the code you are wrote so me and others can understand more clearly about what function do you want to add to it. – JaniniRami Jul 24 '20 at 18:35
  • Yeah I can do that. I'm running it as an executable that I'm working on changing to a web API. – UndefinedKid01 Jul 24 '20 at 19:03