0

I am looking to pull in a csv file that is downloaded to my downloads folder into a pandas dataframe. Each time it is downloaded it adds a number to the end of the string, as the filename is already in the folder. For example, 'transactions (44).csv' is in the folder, the next time this file is downloaded it is named 'transactions (45).csv'.

I've looked into the glob library or using the os library to open the most recent file in my downloads folder. I was unable to produce a solution. I'm thinking I need some way to connected to the downloads path, find all csv file types, those with the string 'transactions' in it, and grab the one with the max number in the full filename string.

list(csv.reader(open(path + '/transactions (45).csv'))

I'm hoping for something like this path + '/%transactions%' + 'max()' + '.csv' I know the final answer will be completely different, but I hope this makes sense.

2 Answers2

1

Assuming format "transactions (number).csv", try below:

import os
import numpy as np

files=os.listdir('Downloads/')
tranfiles=[f for f in files if 'transactions' in f]

Now, your target file is as below:

target_file=tranfiles[np.argmax([int(t.split('(')[1].split(')')[0]) for t in tranfiles])]

Read that desired file as below:

df=pd.read_csv('Downloads/'+target_file)
Parth
  • 644
  • 4
  • 10
1

One option is to use regular expressions to extract the numerically largest file ID and then construct a new file name:

import re
import glob 
last_id = max(int(re.findall(r" \(([0-9]+)\).csv", x)[0]) \
              for x in glob.glob("transactions*.csv"))
name = f'transactions ({last_id}).csv'

Alternatively, find the newest file directly by its modification time

Note that you should not use a CSV reader to read CSV files in Pandas. Use pd.read_csv() instead.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • I've attempted to run this code and can debug and see that the loop is looping through all the csv in my path folder, but when it exits, I do not have a 'last_id'. I think the issue is is that the loop isn't looking at number (maybe the regular expression isn't isolating to only numbers?). I instead used the link you provided and I will go with this solution for now. – Paul Brown Sep 23 '19 at 05:00
  • Your comment makes no sense. If the code runs and dies not crash, there second line must create variable `last_id`. – DYZ Sep 23 '19 at 05:03