0

by the following HTTP request


import requests
import csv

url = 'http://www.culturaitalia.it/oaiProviderCI/OAIHandler?verb=ListRecords&metadataPrefix=pico&set=collezione_pansa_villa_frigerj'

e = requests.get(url)

data = e.text

print(data)

I'm having as output this XML file

<record><header><identifier>oai:culturaitalia.it:oai:culturaitalia.it:museiditalia-work_46880</identifier><datestamp>2018-08-29T17:56:41Z</datestamp><setSpec>museiditalia_opere</setSpec><setSpec>opere_museid</setSpec><setSpec>Beni_culturali</setSpec><setSpec>collezione_pansa_villa_frigerj</setSpec></header><metadata>
<pico:record xmlns:pico="http://purl.org/pico/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:iccd="http://purl.org/pico/iccd/2.00/" xmlns:oad="http://purl.org/pico/iccd/2.00/oa-d-n/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:smi="http://purl.org/pico/iccd/2.00/s-mi/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:bdm="http://purl.org/pico/iccd/2.00/bdm/" xmlns:mets="http://www.loc.gov/METS/" xmlns:f="http://purl.org/pico/iccd/2.00/f/" xmlns:vra="http://www.vraweb.org/vracore4.htm" xmlns:iccd3="http://purl.org/pico/iccd/3.00/" xmlns:mix="http://www.loc.gov/mix/v20" xmlns:nu="http://purl.org/pico/iccd/3.00/nu/" xmlns:premis="info:lc/xmlns/premis-v2" xsi:schemaLocation="http://purl.org/pico/1.0/               http://www.culturaitalia.it/pico/schemas/1.0/pico.xsd                     http://purl.org/pico/iccd/2.00/         http://www.culturaitalia.it/pico/schemas/iccd/2.00/iccd.xsd                     http://purl.org/pico/iccd/2.00/oa-d-n/  http://www.culturaitalia.it/pico/schemas/iccd/2.00/oa-d-n.xsd                     http://purl.org/pico/iccd/2.00/s-mi/    http://www.culturaitalia.it/pico/schemas/iccd/2.00/s-mi.xsd                     http://purl.org/pico/iccd/2.00/bdm/     http://www.culturaitalia.it/pico/schemas/iccd/2.00/bdm.xsd                     http://purl.org/pico/iccd/2.00/f/       http://www.culturaitalia.it/pico/schemas/iccd/2.00/f.xsd                     http://purl.org/pico/iccd/3.00/         http://www.culturaitalia.it/pico/schemas/iccd/3.00/iccd.xsd                     http://purl.org/pico/iccd/3.00/nu/      http://www.culturaitalia.it/pico/schemas/iccd/3.00/nu.xsd">
  <dc:identifier>work_46880</dc:identifier>
  <dc:title>BROCCHETTA MINIATURISTICA</dc:title>
  <dc:subject xsi:type="pico:Thesaurus">http://culturaitalia.it/pico/thesaurus/4.1#reperti_archeologici</dc:subject>
  <dc:description xml:lang="it">BROCCHETTA MONOANSATA. ANSA A DOPPIO BASTONCELLO ARCUATO CHE SI SALDA SULCOLLO AL DI SOTTO DEL LABBRO ESPANSO. CORPO BACCELLATO CON INCISIONE AD XSOTTO L'ANSA, BASSO PIEDE TRONCOCONICO. VERNICE MALCOTTA CON AVVAMPATURESUL PIEDE.</dc:description>
  <dcterms:spatial>Museo Archeologico Nazionale d'Abruzzo, Villa Frigerj, CHIETI (CH) - ITALIA - sala collezione Pansa - vetrina 1, inv. 3130</dcterms:spatial>
  <dcterms:spatial xsi:type="pico:ISTAT">name=CHIETI; year=2001; code=069022</dcterms:spatial>
  <dcterms:created>SEC. III A.C.</dcterms:created>
  <dcterms:created xsi:type="dcterms:Period">start=299; end=250</dcterms:created>
  <dc:type xsi:type="mdi:Type">Opere</dc:type>
  <dc:type xml:lang="it">BROCCHETTA MINIATURISTICA</dc:type>
  <dc:type xsi:type="dcterms:DCMIType">PhysicalObject</dc:type>
  <dcterms:isPartOf xsi:type="dcterms:URI">oai:culturaitalia.it:museiditalia-coll_445</dcterms:isPartOf>
  <dc:rights xml:lang="it"/>
  <dcterms:rightsHolder xml:lang="it">PROPRIETA' STATO, Ministero per i Beni e le Attività Culturali</dcterms:rightsHolder>
  <dcterms:isReferencedBy xml:lang="it">Scheda ICCD RA: 13-00008576</dcterms:isReferencedBy>
  <pico:materialAndTechnique xml:lang="it">ARGILLA</pico:materialAndTechnique>
  <dcterms:extent>altezza: cm 9.4</dcterms:extent>
  <dcterms:extent>diametro: cm 6.9</dcterms:extent>
  <pico:preview xsi:type="dcterms:URI">http://194.242.241.163/fedora/objects/work:46880/datastreams/MM135934/content</pico:preview>
  <dcterms:isReferencedBy xsi:type="pico:Anchor">title=visualizza il file Mets; URL=fedora/objects/work:46880/datastreams/export/content</dcterms:isReferencedBy>
</pico:record>
</metadata></record>

How can I write to a CSV file the output from my HTTP request? Maybe using Pandas?

Regards

Pelide
  • 468
  • 1
  • 4
  • 19

2 Answers2

0

I advise you to use json format which is easier to deal in python you can play with it as you want. But look at this post it may be helpful for you.

Sido4odus
  • 128
  • 1
  • 13
0

You can parse some of the data with regular expressions.

import re
import pandas as pd

# I like to "tokenize" text, if possible.
tokens = [i.strip() for i in sample.split('\n') if len(i) > 0]

# Create a regular expression pattern for  tag values and text values
# Note: the ?P<> part is how we can identify each matching section.
full_pat = r"<(?P<tag>[a-z0-9:\"\.:\= ]+)>(?P<text>[\w\d ]+)<?/?"

# Compile it (for speed, I think)
# The re.I flag means to ignore whether the letter is uppercase or lowercase
p = re.compile(full_pat, flags=re.I)

results_dict = dict()
for i, v in enumerate(tokens):
    res = p.search(v)
    try:
        # Append a dictionary with our tag and text values to our results dictionary.
        results_dict[i] = dict(tag=res.group('tag'), text=res.group('text'))
    except AttributeError:
        pass

Output of results_dict:

{0: {'tag': 'identifier', 'text': 'oai'},
 2: {'tag': 'dc:identifier', 'text': 'work_46880'},
 3: {'tag': 'dc:title', 'text': 'BROCCHETTA MINIATURISTICA'},
 4: {'tag': 'dc:subject xsi:type="pico:Thesaurus"', 'text': 'http'},
 5: {'tag': 'dc:description xml:lang="it"', 'text': 'BROCCHETTA MONOANSATA'},
 6: {'tag': 'dcterms:spatial', 'text': 'Museo Archeologico Nazionale d'},
 7: {'tag': 'dcterms:spatial xsi:type="pico:ISTAT"', 'text': 'name'},
 8: {'tag': 'dcterms:created', 'text': 'SEC'},
 9: {'tag': 'dcterms:created xsi:type="dcterms:Period"', 'text': 'start'},
 10: {'tag': 'dc:type xsi:type="mdi:Type"', 'text': 'Opere'},
 11: {'tag': 'dc:type xml:lang="it"', 'text': 'BROCCHETTA MINIATURISTICA'},
 12: {'tag': 'dc:type xsi:type="dcterms:DCMIType"', 'text': 'PhysicalObject'},
 13: {'tag': 'dcterms:isPartOf xsi:type="dcterms:URI"', 'text': 'oai'},
 15: {'tag': 'dcterms:rightsHolder xml:lang="it"', 'text': 'PROPRIETA'},
 16: {'tag': 'dcterms:isReferencedBy xml:lang="it"', 'text': 'Scheda ICCD RA'},
 17: {'tag': 'pico:materialAndTechnique xml:lang="it"', 'text': 'ARGILLA'},
 18: {'tag': 'dcterms:extent', 'text': 'altezza'},
 19: {'tag': 'dcterms:extent', 'text': 'diametro'},
 20: {'tag': 'pico:preview xsi:type="dcterms:URI"', 'text': 'http'},
 21: {'tag': 'dcterms:isReferencedBy xsi:type="pico:Anchor"', 'text': 'title'}}

Convert to a Pandas DataFrame and use the .to_csv() function to write a csv file (I'll let you figure that part out). Note: We have to make sure our dictionary is parsed correctly, so we have the orientation as 'index,' versus the default value of 'columns'.

df = pd.DataFrame().from_dict(results_dict, orient='index')
print(df)

Output:

                                              tag                            text
0                                      identifier                             oai
2                                   dc:identifier                      work_46880
3                                        dc:title       BROCCHETTA MINIATURISTICA
4            dc:subject xsi:type="pico:Thesaurus"                            http
5                    dc:description xml:lang="it"           BROCCHETTA MONOANSATA
6                                 dcterms:spatial  Museo Archeologico Nazionale d
7           dcterms:spatial xsi:type="pico:ISTAT"                            name
8                                 dcterms:created                             SEC
9       dcterms:created xsi:type="dcterms:Period"                           start
10                    dc:type xsi:type="mdi:Type"                           Opere
11                          dc:type xml:lang="it"       BROCCHETTA MINIATURISTICA
12            dc:type xsi:type="dcterms:DCMIType"                  PhysicalObject
13        dcterms:isPartOf xsi:type="dcterms:URI"                             oai
15             dcterms:rightsHolder xml:lang="it"                       PROPRIETA
16           dcterms:isReferencedBy xml:lang="it"                  Scheda ICCD RA
17        pico:materialAndTechnique xml:lang="it"                         ARGILLA
18                                 dcterms:extent                         altezza
19                                 dcterms:extent                        diametro
20            pico:preview xsi:type="dcterms:URI"                            http
21  dcterms:isReferencedBy xsi:type="pico:Anchor"                           title
Mark Moretto
  • 2,344
  • 2
  • 15
  • 21