1

I'm working on a problem that I can't resolve since this morning. I tried a lot of thing but it never works.

I explain you my problem. I have a list in python and in this list I have a dictionary , like this :

"links": [{"url": "http://catherineingram.com/biography.html", "type": {"key": "/type/link"}, "title": "Biography"}, {"url": "http://www.youtube.com/watch?v=4lJK9cfXP3c", "type": {"key": "/type/link"}, "title": "Interview on Consciousness TV"}, {"url": "http://www.huffingtonpost.com/catherine-ingram/", "type": {"key": "/type/link"}, "title": "Blog on Huffington Post"}]

And my goal is to recover the 3 elements of URL and put it in my database and I try too to recover the 3 element title and put it in my database

I tried it

for record in csv.DictReader(open(INPUT_FILE, 'r'), fieldnames=COLUMNS, delimiter='\t'):
    j = json.loads(record['json'])

    if 'links' in j:
        for n in j['links']:
            lien.append(n)
        print(lien)
        dico = {"url": lien[(0)]}
        print(dico)

    else:
        links= ''

Here input file here ties in the links that I give you above.

So I would like to know How can I got only "url" and "title" from my links

The results about what I did( my code that I show you ) is :

[{'url': 'http://catherineingram.com/biography.html', 'type': {'key': '/type/link'}, 'title': 'Biography'}, {'url': 'http://www.youtube.com/watch?v=4lJK9cfXP3c', 'type': {'key': '/type/link'}, 'title': 'Interview on Consciousness TV'}, {'url': 'http://www.huffingtonpost.com/catherine-ingram/', 'type': {'key': '/type/link'}, 'title': 'Blog on Huffington Post'}]
{'url': {'url': 'http://catherineingram.com/biography.html', 'type': {'key': '/type/link'}, 'title': 'Biography'}}

And I would like to have only these element :

'url': 'http://catherineingram.com/biography.html'
'title': 'Biography'
'url': 'http://www.youtube.com/watch?v=4lJK9cfXP3c'
'title': 'Interview on Consciousness TV'
'url': 'http://www.huffingtonpost.com/catherine-ingram/'
'title': 'Blog on Huffington Post'
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
raph
  • 119
  • 7

4 Answers4

2
result = []
for record in csv.DictReader(open(INPUT_FILE, 'r'), fieldnames=COLUMNS, delimiter='\t'):
j = json.loads(record['json'])

if 'links' in j:
    for link in j['links']:
        result.append(link['url'])
        result.append(link['title'])
else:
    links= ''

You just have to access the attribut of the current array element.

The output you desire is quit weird, you want a dictionary or a simple array ?

So if you want to reconstruct a dictionary just have to follow this code :

result = []
for record in csv.DictReader(open(INPUT_FILE, 'r'), fieldnames=COLUMNS, delimiter='\t'):
j = json.loads(record['json'])

if 'links' in j:
    for link in j['links']:
        result.append({ 'url': link['url'], 'title': link['title']})
else:
    links= ''
Zenocode
  • 656
  • 7
  • 19
1

I don't know if I really understand your problem, but if you just want to have the url and title, you can drop the "type" key, and have the rest :

for item in j['links']:
    item.pop('type', None)
Haytek
  • 93
  • 1
  • 8
0

The elements in your desired dictionary have duplicate keys. Python doesn't support this, and although there are ways around this, it doesn't give you the result you have asked for.

It would instead give the following:

{
    "url": [
        "http://catherineingram.com/biography.html",
        "http://www.youtube.com/watch?v=4lJK9cfXP3c",
        "http://www.huffingtonpost.com/catherine-ingram/",
    ],
    "title": ["Biography", "Interview on Consciousness TV", "Blog on Huffington Post"],
}

However, if you wanted to print the results, then it would be straightforward:

# Original results
results = [{'url': 'http://catherineingram.com/biography.html',
  'type': {'key': '/type/link'},
  'title': 'Biography'},
 {'url': 'http://www.youtube.com/watch?v=4lJK9cfXP3c',
  'type': {'key': '/type/link'},
  'title': 'Interview on Consciousness TV'},
 {'url': 'http://www.huffingtonpost.com/catherine-ingram/',
  'type': {'key': '/type/link'},
  'title': 'Blog on Huffington Post'}]

# Each item in the list is a dictionary
for dict in results:
    for item in dict:
        # We only want urls and titles
        if item != 'type':
            print(f"{item}: {dict[item]}")

# OUTPUT
# url: http://catherineingram.com/biography.html
# title: Biography
# url: http://www.youtube.com/watch?v=4lJK9cfXP3c
# title: Interview on Consciousness TV
# url: http://www.huffingtonpost.com/catherine-ingram/
# title: Blog on Huffington Post
Haren S
  • 719
  • 4
  • 15
0

@harens Thanks for your answer, I tried your method and I got it :

[{'url': 'http://catherineingram.com/biography.html', 'type': {'key': '/type/link'}, 'title': 'Biography'}, {'url': 'http://www.youtube.com/watch?v=4lJK9cfXP3c', 'type': {'key': '/type/link'}, 'title': 'Interview on Consciousness TV'}, {'url': 'http://www.huffingtonpost.com/catherine-ingram/', 'type': {'key': '/type/link'}, 'title': 'Blog on Huffington Post'}]

So I got type yet but when I want put it in my database, it did nothing :

@AntoineFrau

I tried too your method and I got it :

[{'url': 'http://catherineingram.com/biography.html', 'title': 'Biography'}, {'url': 'http://www.youtube.com/watch?v=4lJK9cfXP3c', 'title': 'Interview on Consciousness TV'}, {'url': 'http://www.huffingtonpost.com/catherine-ingram/', 'title': 'Blog on Huffington Post'}]

So it works perfectely but my problem is that, when I try to put it in my database like this :

result=[]

    if 'links' in j:
        for link in j['links']:
            result.append({'url': link['url'], 'title': link['title']})
        print(result)
        links=result
    else:
        links = ''

         #   print(n)
            #links_url.append(n['url'])
            #links_title.append(n['title'])
            # links_url.append(n['url'])
            # links_title.append(n['title'])



    c.execute('INSERT INTO AUTHORS VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)',
          [record['key'],
           j.get('name'),
           j.get('eastern_order'),
           j.get('personal_name'),
           j.get('enumeration'),
           j.get('title'),
           bio,
           alternate_names,
           uris,
           j.get('location'),
           j.get('birth_date'),
           j.get('death_date'),
           j.get('date'),
           j.get('wikipedia'),
           links
          ])
db.commit()

It gives me an error about links :

 line 63, in <module>
    c.execute('INSERT INTO AUTHORS VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)',
sqlite3.InterfaceError: Error binding parameter 14 - probably unsupported type.

I try to put url and title in my database in the entity "links"

Thanks for your answer again !

raph
  • 119
  • 7