1

The Input

I'm receiving input from an external JSON source, which contain paths. Follow this:

datalake-dev/facial_recognition/
datalake-dev/facial_recognition/curation/google-search-images/this_is_a_dir.png/pic0.jpg
datalake-dev/facial_recognition/curation/google-search-images/this_is_a_dir.png/pic1.jpg
datalake-dev/facial_recognition/curation/google-search-images/this_is_a_dir.png/pic10.png
datalake-dev/facial_recognition/curation/google-search-images/this_is_a_dir.png/pic11.jpg
datalake-dev/facial_recognition/curation/google-search-images/this_is_a_dir.png/pic12.png
datalake-dev/facial_recognition/curation/google-search-images/this_is_a_dir.png/pic13.jpg
datalake-dev/facial_recognition/landing/input-images/
datalake-dev/facial_recognition/landing/input-images/this_is_a_dir.png

The Help

from this, I need to pass it on in an API / JSON / Dictionary format for further processing. So far I've been through one, two, three and four threads. Nothing has sufficed to solution.

The Required Output

From the paths I need to get Dictionary / JSON format in following way:

{

    "curation":{
        "google-search-images":[
            {
                "name":"pic0"
            },
            {
                "name":"pic1"
            }
        ]
    },
    "derived":{
        "recognition-matches":[
            {
                "name":"img2"
            }
        ],
        "errors":[
            {
                "name":"foo"
            }
        ]
    }

}

In the above Dictionary / JSON the names curation, google-search-images, this_is_a_dir.png are all directories. I need something that recursively puts them into dictionary based on length of these paths.

My Trial

for contents in result['Contents']:

    directory_or_file_list = contents['Key'].split('/')  # To identify if the path is pointing as file / directory
    path = contents['Key']

    splitted_path = path.split('/')
    # ['datalake-dev', 'facial_recognition', 'landing', 'input-images', 'this_is_a_dir.png', 'pic0.jpg']

    if '' in splitted_path:
        splitted_path.pop()
        all_paths.append(splitted_path)
        # The object 'api' holds the dictionary expected.
        api[splitted_path[0]] = splitted_path[1]
        # api[splitted_path[0]] = {splitted_path[1] : {splitted_path[2] : [append_all_elements_under_this]} }

    if directory_or_file_list[-1].split('.')[-1] in ['jpg', 'jpeg', 'png', 'tiff']:
        print(path)

    else:
        print(path)

Note: Perhaps there is a way to hard code, but then I wouldn't post it here it that'd be the case. Also, no chance of using os.walk(). Been there done that. It isn't OS File system.

Any help beside my code is welcomed!

T3J45
  • 717
  • 3
  • 12
  • 32
  • It is hard to tell what exactly is what you want. I don't see the logic that takes you from the list of path that you put to the dictionary that you describe below, is this supposed to be the output for that input? If not, please include an input with its expected output and describe with some detail how exactly it should be formed. You indicate that you need "something that recursively puts them into dictionary based on length of these paths", what that does mean exactly? "based on length" how? – jdehesa Nov 16 '18 at 17:59
  • I would suggest using `os.path.split()` as opposed to `string.split('/')`. Upon your first split you can check if the tail is a file with `os.path.splitext()`. Neither of these actually tries to access a file on disk; they're just string operations designed specifically for paths. For the dictionary, there's nothing special you need to do, though. Just get all the components of your path and do `dictionary[path0][path1][path2] = path3` – ahota Nov 16 '18 at 18:06
  • Why isn't `this_is_a_dir.png` included in your JSON output? – slider Nov 16 '18 at 18:09
  • 1
    @jdehesa I'm sorry to be so complicated with this. Let me clear the mess here. The input is coming from a json request, which has a 'key' attribute with every element from Contents (which you can see in for loop). Those keys are nothing but path to file as shown in *The Input* and from that input I need *The Required output* . By length of path, I meant splitted path by '/'. Since I don't have any access to server storage, I have to struggle without os package. – T3J45 Nov 16 '18 at 18:09
  • @ahota I agree that these do not access the files on disk, but effectually that's why the request is being made, so that the requestor can see it. Besides, os.path.split() can work irrespective of actual os path? I haven't tried it, you might want to reconsider? – T3J45 Nov 16 '18 at 18:13
  • @slider good question, well because that's not needed. Creates ambiguity down the line, hence. – T3J45 Nov 16 '18 at 18:14
  • What is the criteria for omitting such pieces from the path? – slider Nov 16 '18 at 18:14
  • @slider the name contains image like extension like .png /. Jpg and so forth – T3J45 Nov 16 '18 at 18:15
  • @T3J45 `os.path.split()` will work on any string, whether it's actually a path or not. But it's helpful to use since it's already optimized for paths and files. Can you also post what your code is outputting? It's hard to tell what `api` actually looks like from the code you posted. – ahota Nov 16 '18 at 18:19
  • @ahota alright, I'll give it a try. I'll post the o/p once done. – T3J45 Nov 16 '18 at 18:20
  • @ahota Hey, the os.path.split(obj) has resulted like this `('datalake-dev/facial_recognition/landing/input-images', 'this_is_a_dir.png')` which is not what I wanted though. However I can use this to separate actual files, but as you can see this is separating the directory. – T3J45 Nov 17 '18 at 04:28
  • @T3J45, yeah you will have to do it repeatedly until you have parsed the full path. The reason I suggested it is because you can combine it with other methods from `os` which are string-only and not worry about edge cases. Ultimately your dictionary will just need to create keys for directories. You can maybe use `defaultdict` to simplify that. – ahota Nov 17 '18 at 04:33
  • @ahota Sounds good. Can you help with that 'this_is_a_dir.png' thing? Even though it is a directory, it treats as if it's a file. – T3J45 Nov 17 '18 at 04:37
  • @ahota also, is it possible to use os.walk() anyway? that seems to work good on machine. If I can set somewhere those paths, may be it can traverse? – T3J45 Nov 17 '18 at 04:42
  • My suggestion would be that if it's not the last element of the path, then you know for sure it's a directory (well, pretty sure - it could be a malformed path). If it _is_ the last element of the path, there's no way to know if it's actually a directory or not since you're not on the filesystem. – ahota Nov 17 '18 at 04:42
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/183807/discussion-between-ahota-and-t3j45). – ahota Nov 17 '18 at 04:43

0 Answers0