I have a folder on my computer that contains ~8500 .csv files that are all names of various stock tickers. Within each .csv file, there is a 'timestamp' and 'users_holding' column. I have the 'timestamp' column set up as a datetime index, as the entries in that column include hourly entries for each day ex/ 2019-12-01 01:50, 2020-01-01 02:55... 2020-01-01 01:45 etc. Each one of those timestamps has a corresponding integer representing the number of users holding at that time. I want to create a for loop that iterates through all of the .csv files and tallies up the total users holding across all .csv files for the latest time every day starting on February 1st, 2020 (2020-02-01) until the last day in the .csv file. The folder updates daily, so I can't really have an end date.
This is the for loop I have set up to establish each ticker as a dataframe:
path = 'C:\\Users\\N****\\Desktop\\r******\\t**\\p*********\\'
all_files = glob.glob(path + "/*.csv")
for filename in all_files:
df = pd.read_csv(filename, header = 0, parse_dates = ['timestamp'], index_col='timestamp')
If anyone could show me how to write the for loop that finds the latest entry for each date and tallies up that number for each day, that would be amazing.
Thank you!