Removing a variable number of indexes between two specified indexes

Question

So I've been trying to mess around with a json for the first time which I'm pulling from the kanka.io API. I'm trying to remove any indexes between 'entry' and either 'section' or 'entry_parsed' so I can determine if an ID pertains to a character or an attribute and append only the character names to a list.

I've shortened the list itself which I turned the json into, for the sake of testing in python tutor's live programming mode.

# Request data from URL
response = requests.request("GET", url, headers=headers, data=payload)
# Open data
rtext=response.text
# Clean data
punct = ['{','}','[',']','\"',':',',']
rt = ""
for item in rtext:
    if item in punct:
        rt+=str(' ')
    else:
        rt+=str(item)
# Itemize string of text
rsplit = rt.split()
#rsplit = [
#'id', '260405', 'name', 'Frank', 'Burns', 'entry', 'null', 'entry_parsed', 'traits', 
#'id', '260406', 'name', 'Henry', 'Blake', 'entry', 'null', 'entry_parsed', 'null', 'image', 'null', 
#'id', '260407', 'name', 'Margret', 'Houlihan', 'entry', 'null', 'entry_parsed', 'null', 'image', 'true', 'is_private', 'true',  
#'id', '260408', 'name', 'John', 'MacInyre', 'entry', '\\n<p>Graduate', 'of', 'Darthmouth.<\\/p>\\n<p>\\u00a0<\\/p>\\n', 'entry_parsed',
#'id', '260409', 'name', 'Walter', 'O\'Reilly', 'entry', 'null', 'entry_parsed', 'null', 'image', 'image_full', 'https',
#'id', '260410', 'name', 'Benjiam', 'Franklin', 'Pierce', 'entry', 'null', 'entry_parsed', 'null', 'image', 'image_full', 'https', 
#'id', '165148', 'name', 'Eyes', 'entry', 'Blue', 'section', 'appearance', 'is_private', 'false', 'default_order', '1', 
#'id', '260411', 'name', 'Francis', 'Mulcahy', 'entry', 'null', 'entry_parsed', 'null',
#]

#########
# NAMES #
#########
# Append character names into list
this1=0
# Cycle throught all the words
while this1 < len(rsplit):
  next1 = this1+1
  last1 = this1-1
# Stop at the first element after 'name'
  if rsplit[last1] == "name":
# Read and concatenate elements until the element 'entry'
    while rsplit[next1] != "entry":  
      nextword = rsplit[next1]
      rsplit[this1]+='_'+nextword
# Remove redundant elements by replacing next with last
      rsplit[next1]=rsplit[this1]
      rsplit.remove(rsplit[this1]) 

# Remove words inbetween entry and (entry_parsed or section)
    if rsplit[this1] == "entry":
      while rsplit[next1] != ("entry_parsed" or "section"):
        rsplit.remove(rsplit[descWord])
    print(rsplit[this1:next1+4])
    
  this1+=1

What I would want it to print from the printline is

['Frank_Burns', 'entry', 'entry_parsed', 'traits']
['Henry_Blake', 'entry', 'entry_parsed', 'null']
['Margret_Houlihan', 'entry', 'entry_parsed', 'null']
['John_MacInyre', 'entry','entry_parsed']
["Walter_O'Reilly", 'entry', 'entry_parsed', 'null']
['Benjiam_Franklin_Pierce', 'entry', 'entry_parsed', 'null']
['Eyes', 'entry', 'section', 'appearance']
['Francis_Mulcahy', 'entry', 'entry_parsed', 'null']

I've tried different variations where the index after entry is == this1, last1, next1, and none of them are actually removing the index object between 'entry' and 'entry_parsed' or 'section'. I've also tried

if rsplit[this1] == "entry":
      while not rsplit[next1] == "entry_parsed" or "section":

and it still keeps printing out 'null' or 'Blue', etc.

How exactly did you obtain `rsplit`? Looks like it once was a dictionary and you managed to convert it to a list somehow and now you have difficulty extracting values from the list which would have been very easy when using the dictionary? — mkrieger1, Mar 07 '21 at 19:37
This is what I used to actually populate rsplit in my code. `# Request data from URL response = requests.request("GET", url, headers=headers, data=payload) # Open data rtext=response.text # Clean data punct = ['{','}','[',']','\"',':',','] rt = "" for item in rtext: if item in punct: rt+=str(' ') else: rt+=str(item) # Itemize string of text rsplit = rt.split()' — reivermello, Mar 07 '21 at 19:39
Yeah. Instead of replacing JSON syntax with spaces and splitting the string into a list, parse the response as JSON. — mkrieger1, Mar 07 '21 at 19:40
Can you please edit the question and show what you want to get as a result instead of what it currently prints? — mkrieger1, Mar 07 '21 at 19:41
*"so I can determine if an ID pertains to a character or an attribute and append only the character names to a list"* - based on which entries in the list do you determine whether an ID pertains to a character or an attribute? — mkrieger1, Mar 07 '21 at 19:49
Are you aware that you can get a list of only characters to begin with? https://kanka.io/en/docs/1.0/characters#all-characters — mkrieger1, Mar 07 '21 at 19:53
Yeah, that's what I've been using but within it are name and id fields for both characters and their traits. But I guess I just need to go back and take a better look at dictionary methods and functions. — reivermello, Mar 07 '21 at 22:14
See https://stackoverflow.com/questions/6386308/http-requests-and-json-parsing-in-python — mkrieger1, Mar 07 '21 at 22:58
_"based on which entries in the list do you determine whether an ID pertains to a character or an attribute?"_ Characters have 'entry_parsed' after a variable number of indexes after 'entry' while attributes have 'section' after an variable number of indexes after 'entry'. — reivermello, Mar 08 '21 at 13:25

score 1 · Answer 1 · answered Mar 08 '21 at 13:48

Based on the information in the comments, you want to do the following:

make a request to the kanka.io API
parse the response as JSON, expecting a list of dictionaries
select those dictionaries which have a key 'entry_parsed'
create a list of the 'name' values for all selected dictionaries

Therefore you should keep¹ only the first line of your code (making the request) and scrap the rest, and use this instead:

# 1. Request data from URL
response = requests.get(url, headers=headers, data=payload)

# 2. parse as JSON
data = response.json()

# 3. + 4. list of 'name' values for all dicts having 'entry_parsed'
names = [d['name'] for d in data if 'entry_parsed' in d]

¹Instead of using requests.request('GET', ...), you can just use requests.get(...).

score 0 · Accepted Answer · answered Mar 08 '21 at 19:33

I was able to extract just the name value (as well as values for 'entity_id' and 'tags') using the following code.

# Request data from URL
response = requests.get(url, headers=headers)

rj = response.json()

# using .items() allowed me to keep the tuples together so I could call keys to get their values

ri = rj.items()

name=[]
enid=[]
tags=[]

for i in ri:
  for j in i:
    for k in j:
      # since there is string metadata at the beginning and end of ri I narrowed it down to only the dictionaries which contained the values I needed
      if type(k) == dict:
        name.append(k['name'])
        enid.append(k['entity_id'])
        tags.append(k['tags'])

Since the data remained as a dictionary instead of a string I didn't need to use 'entity_parsed' or 'section' to identify characters versus attributes because the 'id' and 'name' of attributes were values to the 'traits' key.

Many thanks to @mkrieger1 for pointing me in the right direction!

Removing a variable number of indexes between two specified indexes

2 Answers2