1

I'm new to python and I'm trying to read all the files in a folder over a certain size and export the data (file path and size) to a .json

What I have so far:

import os       
import json
import sys
import io

testPath = str(sys.argv[1])
testSize = int(sys.argv[2])

try:
    to_unicode = unicode
except NameError:
    to_unicode = str

filesList = []
x = 1
j = "1"
data = {}

for path, subdirs, files in os.walk(testPath):
    for name in files:
        filesList.append(os.path.join(path, name))

for i in filesList:
    fileSize = os.path.getsize(str(i))
    if int(fileSize) >= int(testSize):
        data['unit'] = 'B'
        data['path' + j] = str(i)
        data['size' + j] = str(fileSize)
        x = x + 1
        j = str(x)


with io.open('Files.json', 'w', encoding='utf8') as outfile:
    str_ = json.dumps(data,
                      indent=4, sort_keys=True,
                      separators=(',', ': '), ensure_ascii=False)
    outfile.write(to_unicode(str_))

The problem is that the output is:

{
    "path1": "C:\\Folder\\diager.xml",
    "path2": "C:\\Folder\\diag.xml",
    "path3": "C:\\Folder\\setup.log",
    "path4": "C:\\Folder\\ESD\\log.txt",
    "size1": "1908",
    "size2": "4071",
    "size3": "5822",
    "size4": "788",
    "unit": "B"
}

But it needs to be something like this:

{
"unit": "B",
"files": [{"path":"C:\Folder\file1.txt", "size": "10"}, {"path":"C:\Folder\file2.bin", "size": "400"}]
}

I added the j variable because it would just replace the first value and I would just end up with something like this:

{
    "path": "C:\\Folder\\diager.xml",
    "size": "1908",
    "unit": "B"
}

I have no idea how to proceed... Help?

2 Answers2

2

You can do something like this:

files = []
for i in filesList:
    fileSize = os.path.getsize(str(i))
    if int(fileSize) >= int(testSize):
        files.append({'path': str(i), 'size': fileSize})

data['unit'] = 'B'
data['files'] = files

This way, you create a list containing all paths and add it to the data dict later.

amuttsch
  • 1,254
  • 14
  • 24
0

Initialize your data dictionary with:

data = {"unit": "B", "files": []}

You can then replace your main loop:

for i in filesList:
    fileSize = os.path.getsize(str(i))
    if int(fileSize) >= int(testSize):
        data['unit'] = 'B'
        data['path' + j] = str(i)
        data['size' + j] = str(fileSize)
        x = x + 1
        j = str(x)

by

for i in filesList:
    fileSize = os.path.getsize(str(i))
    if int(fileSize) >= int(testSize):
        data['files'].append({"path": str(i), "size": str(filesize)})

Note that you no longer need your x and j variables.

Edit: In order to control the order of the fields, you can see this question. In particular, according to this nice answer, if you are using python 3.6, you can import OrderedDict (from collections import OrderedDict) and replace data = {"unit": "B", "files": []} by data = OrderedDict(unit="B", files=[])

gchelfi
  • 89
  • 5
  • In order to control the order of the fields, you can see [this question](https://stackoverflow.com/q/10844064/4865672) – gchelfi Jun 20 '17 at 17:15
  • Works like a charm! Also, I just set sort_keys to False instead of True and now it's not printing alphabetically. Thanks! – Claudiu Dragan Jun 20 '17 at 17:15