4

I have the Python code below in which I am attempting to access a folder called downloaded that contains multiple JSON object files.

Within each JSON there is a value keyword for which I need to extract and add to the list named keywordList

I've attempted by adding the filenames to fileList (which works ok), but I cannot seem to loop through the fileList and extract the keyword connected.

Amy help much appreciated, thanks!

import os

os.chdir('/Users/Me/Api/downloaded')

fileList = []
keywordList = []

for filenames in os.walk('/Users/Me/Api/downloaded'):
    fileList.append(filenames)

for file in filenames:
    with open(file, encoding='utf-8', mode='r') as currentFile:
        keywordList.append(currentFile['keyword'])

print(keywordList)
SDROB
  • 125
  • 2
  • 14

5 Answers5

4

Your question mentioned JSON. So I have addressed that. Let me know if this helps.

import json
import os
import glob
import pprint
keywordList = []
path = '/Users/Me/Api/downloaded'
for filename in glob.glob(os.path.join(path, '*.json')): #only process .JSON files in folder.      
    with open(filename, encoding='utf-8', mode='r') as currentFile:
        data=currentFile.read().replace('\n', '')
        keyword = json.loads(data)["keytolookup"]
        if keyword not in keywordList:
            keywordList.append(keyword)
pprint(keywordList)

EDIT note: Updated answer changing for loop from original response of:
for filename in os.listdir(path)
OP mentioned glob version worked better. Had given that as alternative too.

JGFMK
  • 8,425
  • 4
  • 58
  • 92
  • Thanks @JGFMK this works, but I had to use the addition of the code you wrote for 'or to only process JSON files:' Without this addition, I was getting a file not found error for all of the JSONS. I'm not sure why this is, if you have any idea then please let me know to help my understanding. Thanks! – SDROB Feb 22 '19 at 12:13
  • 1
    That's interesting - since I cobbled it together from another similar stack overflow question regarding processing all the entries in a directory. I anticipated your folder may contain things that weren't JSON and added the extra bit. Here was the [link](https://stackoverflow.com/a/18262324/495157) I sourced it from. There is [this](https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory) handy similar thread too – JGFMK Feb 22 '19 at 12:38
  • I will tweak answer to do that then. – JGFMK Feb 22 '19 at 12:52
1

You are adding the filenames in the fileList array but in the second for loop you are iterating over the filenames instead of the fileList. import os

os.chdir('/Users/Me/Api/downloaded')

fileList = []
keywordList = []

for filenames in os.walk('/Users/Me/Api/downloaded'):
    fileList.append(filenames)

for file in fileList:
    with open(file, encoding='utf-8', mode='r') as currentFile:
        keywordList.append(currentFile['keyword'])
0

open() returns a filehandle to the open file. You still need to loop over the contents of the file. By default, the contents are split by line-end (\n). After that, you have to match the keyword to the line.

Replace the second for loop with:

for file in filenames:
    with open(file, encoding='utf-8', mode='r') as currentFile:
        for line in currentFile:
            if 'keyword' in line:
                keywordList.append('keyword')

Also, have a look at the Python JSON module. Recursive iteration over json/dicts is answered here.

zan
  • 441
  • 2
  • 8
0

Shouldn't the line for file in filenames: be for file in fileList:?

Also I think this is the correct way to use os.walk()

import os

fileList = []
keywordList = []

for root, dirs, files in os.walk('/Users/Me/Api/downloaded', topdown=False):
   for name in files:
      fileList.append(os.path.join(root, name))

for file in fileList:
    with open(file, encoding='utf-8', mode='r') as currentFile:
        keywordList.append(currentFile['keyword'])

print(keywordList)
VietHTran
  • 2,233
  • 2
  • 9
  • 16
0

You are using currentFile like it is a json object, but it is only a file handle. I have added the missing step, the parsing of the file to a json object.

import os
import json

os.chdir('/Users/Me/Api/downloaded')

fileList = []
keywordList = []

for filenames in os.walk('/Users/Me/Api/downloaded'):
    fileList.append(filenames)

for file in filenames:
    with open(file, encoding='utf-8', mode='r') as currentFile:
        data = json.load(currentFile)  # Parses the file to json object
        keywordList.append(data['keyword'])

print(keywordList)