0

So I've been trying to mess around with a json for the first time which I'm pulling from the kanka.io API. I'm trying to remove any indexes between 'entry' and either 'section' or 'entry_parsed' so I can determine if an ID pertains to a character or an attribute and append only the character names to a list.

I've shortened the list itself which I turned the json into, for the sake of testing in python tutor's live programming mode.

# Request data from URL
response = requests.request("GET", url, headers=headers, data=payload)
# Open data
rtext=response.text
# Clean data
punct = ['{','}','[',']','\"',':',',']
rt = ""
for item in rtext:
    if item in punct:
        rt+=str(' ')
    else:
        rt+=str(item)
# Itemize string of text
rsplit = rt.split()
#rsplit = [
#'id', '260405', 'name', 'Frank', 'Burns', 'entry', 'null', 'entry_parsed', 'traits', 
#'id', '260406', 'name', 'Henry', 'Blake', 'entry', 'null', 'entry_parsed', 'null', 'image', 'null', 
#'id', '260407', 'name', 'Margret', 'Houlihan', 'entry', 'null', 'entry_parsed', 'null', 'image', 'true', 'is_private', 'true',  
#'id', '260408', 'name', 'John', 'MacInyre', 'entry', '\\n<p>Graduate', 'of', 'Darthmouth.<\\/p>\\n<p>\\u00a0<\\/p>\\n', 'entry_parsed',
#'id', '260409', 'name', 'Walter', 'O\'Reilly', 'entry', 'null', 'entry_parsed', 'null', 'image', 'image_full', 'https',
#'id', '260410', 'name', 'Benjiam', 'Franklin', 'Pierce', 'entry', 'null', 'entry_parsed', 'null', 'image', 'image_full', 'https', 
#'id', '165148', 'name', 'Eyes', 'entry', 'Blue', 'section', 'appearance', 'is_private', 'false', 'default_order', '1', 
#'id', '260411', 'name', 'Francis', 'Mulcahy', 'entry', 'null', 'entry_parsed', 'null',
#]

#########
# NAMES #
#########
# Append character names into list
this1=0
# Cycle throught all the words
while this1 < len(rsplit):
  next1 = this1+1
  last1 = this1-1
# Stop at the first element after 'name'
  if rsplit[last1] == "name":
# Read and concatenate elements until the element 'entry'
    while rsplit[next1] != "entry":  
      nextword = rsplit[next1]
      rsplit[this1]+='_'+nextword
# Remove redundant elements by replacing next with last
      rsplit[next1]=rsplit[this1]
      rsplit.remove(rsplit[this1]) 

# Remove words inbetween entry and (entry_parsed or section)
    if rsplit[this1] == "entry":
      while rsplit[next1] != ("entry_parsed" or "section"):
        rsplit.remove(rsplit[descWord])
    print(rsplit[this1:next1+4])
    
  this1+=1

What I would want it to print from the printline is

['Frank_Burns', 'entry', 'entry_parsed', 'traits']
['Henry_Blake', 'entry', 'entry_parsed', 'null']
['Margret_Houlihan', 'entry', 'entry_parsed', 'null']
['John_MacInyre', 'entry','entry_parsed']
["Walter_O'Reilly", 'entry', 'entry_parsed', 'null']
['Benjiam_Franklin_Pierce', 'entry', 'entry_parsed', 'null']
['Eyes', 'entry', 'section', 'appearance']
['Francis_Mulcahy', 'entry', 'entry_parsed', 'null']

I've tried different variations where the index after entry is == this1, last1, next1, and none of them are actually removing the index object between 'entry' and 'entry_parsed' or 'section'. I've also tried

if rsplit[this1] == "entry":
      while not rsplit[next1] == "entry_parsed" or "section":

and it still keeps printing out 'null' or 'Blue', etc.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • How exactly did you obtain `rsplit`? Looks like it once was a dictionary and you managed to convert it to a list somehow and now you have difficulty extracting values from the list which would have been very easy when using the dictionary? – mkrieger1 Mar 07 '21 at 19:37
  • This is what I used to actually populate rsplit in my code. `# Request data from URL response = requests.request("GET", url, headers=headers, data=payload) # Open data rtext=response.text # Clean data punct = ['{','}','[',']','\"',':',','] rt = "" for item in rtext: if item in punct: rt+=str(' ') else: rt+=str(item) # Itemize string of text rsplit = rt.split()' – reivermello Mar 07 '21 at 19:39
  • Yeah. Instead of replacing JSON syntax with spaces and splitting the string into a list, parse the response as JSON. – mkrieger1 Mar 07 '21 at 19:40
  • Can you please edit the question and show what you want to get as a result instead of what it currently prints? – mkrieger1 Mar 07 '21 at 19:41
  • *"so I can determine if an ID pertains to a character or an attribute and append only the character names to a list"* - based on which entries in the list do you determine whether an ID pertains to a character or an attribute? – mkrieger1 Mar 07 '21 at 19:49
  • Are you aware that you can get a list of only characters to begin with? https://kanka.io/en/docs/1.0/characters#all-characters – mkrieger1 Mar 07 '21 at 19:53
  • Yeah, that's what I've been using but within it are name and id fields for both characters and their traits. But I guess I just need to go back and take a better look at dictionary methods and functions. – reivermello Mar 07 '21 at 22:14
  • 1
    See https://stackoverflow.com/questions/6386308/http-requests-and-json-parsing-in-python – mkrieger1 Mar 07 '21 at 22:58
  • _"based on which entries in the list do you determine whether an ID pertains to a character or an attribute?"_ Characters have 'entry_parsed' after a variable number of indexes after 'entry' while attributes have 'section' after an variable number of indexes after 'entry'. – reivermello Mar 08 '21 at 13:25

2 Answers2

1

Based on the information in the comments, you want to do the following:

  1. make a request to the kanka.io API
  2. parse the response as JSON, expecting a list of dictionaries
  3. select those dictionaries which have a key 'entry_parsed'
  4. create a list of the 'name' values for all selected dictionaries

Therefore you should keep1 only the first line of your code (making the request) and scrap the rest, and use this instead:

# 1. Request data from URL
response = requests.get(url, headers=headers, data=payload)

# 2. parse as JSON
data = response.json()

# 3. + 4. list of 'name' values for all dicts having 'entry_parsed'
names = [d['name'] for d in data if 'entry_parsed' in d]

1Instead of using requests.request('GET', ...), you can just use requests.get(...).

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
0

I was able to extract just the name value (as well as values for 'entity_id' and 'tags') using the following code.

# Request data from URL
response = requests.get(url, headers=headers)

rj = response.json()

# using .items() allowed me to keep the tuples together so I could call keys to get their values

ri = rj.items()

name=[]
enid=[]
tags=[]

for i in ri:
  for j in i:
    for k in j:
      # since there is string metadata at the beginning and end of ri I narrowed it down to only the dictionaries which contained the values I needed
      if type(k) == dict:
        name.append(k['name'])
        enid.append(k['entity_id'])
        tags.append(k['tags'])

Since the data remained as a dictionary instead of a string I didn't need to use 'entity_parsed' or 'section' to identify characters versus attributes because the 'id' and 'name' of attributes were values to the 'traits' key.

Many thanks to @mkrieger1 for pointing me in the right direction!