This code seems to work well and it takes a list of files and compresses them in a format that pandas can read, and combines them into one location.
Edit - modified the code to only add new files (based on the file not existing in the tar).
os.chdir(r'C:\\Users\Documents\FTP\\')
saveloc = r'\\fnp\myDownloads\\'
compression = "w:bz2"
extension = '.tar.bz2'
filename = 'Global_Performance'
filetype = 'performance_*.csv'
tarname = saveloc+filename+extension
files = glob(filetype)
tar = tarfile.open(tarname, compression)
for file in files:
if file not in tarname:
tar.add(file)
tar.close()
filename = 'Global_Status'
filetype = 'status_*.csv'
tarname = saveloc+filename+extension
files = glob(filetype)
tar = tarfile.open(tarname, compression)
for file in files:
if file not in tarname:
tar.add(file)
tar.close()
- Is there a way for pandas to read from that tar file? Can I specify a file I know exists within the file, or perhaps concat all of the files into one read?
- Being able to add new files is nice, but I assume the computer has to read all file names to determine if it exists or not. Is there a way to modify code this to only add the latest files based on a creation date or something? Can this be sped up to only compress and read the newest files or perhaps only within a time range (30 days maybe instead of reading files in a directory which goes back to 2010)?
- As you can see above, I am reading each file type within a directory (based on the filename) and adding it to a separate tar. Is there a way to optimize this a bit instead of pasting the same code over and over (there are 10+ files I do this to)?
Edit - this code seems to operate very slowly. My intention is to only find the newest files which are not within the tar and then compress them and add them to the existing tar. Based on the time it is taking, I am thinking that it is still compressing all files and replacing them. Can someone help me make this a more efficient process.