0

I have a folder on my computer, where are several fiels saved. How can I automatically load the biggest file (in terms of size in kb) of it only?

Right now I could use:

#Sort it with the help of windows, biggest file on top and then:
import pandas as pd
df = pd.read_csv(r'C:\...\FileABC.csv') #when I know FileABC is listed on the top

Is there a way to automatically do that in python? Then I could skip the manual adjustment in windows.

PV8
  • 5,799
  • 7
  • 43
  • 87
  • 1
    how do you define `big` here? number of rows, number of columns, size occupied by the file?? – tidakdiinginkan May 04 '20 at 08:12
  • in terms of MB size – PV8 May 04 '20 at 08:12
  • 1
    Check this [link](https://stackoverflow.com/questions/6591931/getting-file-size-in-python) - you can obtain file size using the `os` module. `os.stat('filename').st_size` should give you the file size in bytes. `os.listdir('dirname')` should give you a list of all files within a given directory `dirname` – tidakdiinginkan May 04 '20 at 08:14
  • 1
    Hint: `os.stat` gives the size of a file in its `st_size` member. – Serge Ballesta May 04 '20 at 08:16

3 Answers3

2

Try this:

import os
import pandas as pd
basedir = 'C:/Users/viupadhy/Desktop/Stackoverflow'
names = os.listdir(basedir)
paths = [os.path.join(basedir, name) for name in names]
sizes = [(path, os.stat(path).st_size) for path in paths]
file = max(sizes, key=lambda x: x[1])
print(file)

df = pd.read_csv(file[0])
df

Output:

enter image description here

Vishal Upadhyay
  • 781
  • 1
  • 5
  • 19
1

A simple way to do this:

import os


def find_largest_file(path):
    largest = None
    max_size = 0
    for filename in os.listdir(path):
        if os.path.isfile(filename):
            size = os.path.getsize(filename)
            if size > max_size:
                largest = filename
                max_size = size
    return largest


print(find_largest_file(path))
# ... whaterver largest file you have in `path`.

This can be further improved by filtering only .csv extension and the like.

norok2
  • 25,683
  • 4
  • 73
  • 99
0

You can use something like:

import os
import pandas as pd


folder_path = "C:\\programs\\"

file_list = os.listdir(folder_path)
biggest_file = os.path.join(folder_path, file_list[0])

for file in file_list:
    file_location = os.path.join(folder_path, file)
    size = os.path.getsize(file_location)

    if size > os.path.getsize(biggest_file):
        biggest_file = file_location

df = pd.read_csv(biggest_file)
  • Wouldn't it be more efficient to keep track of the file size along the way? This would spare you an extra `os.path.getsize()` at each iteration – norok2 May 04 '20 at 09:38
  • Yes, this can be improved, I just wrote for a fast solution and it worked when I tried. Thank you for your advice. – Bahadır Çetin May 04 '20 at 11:08