Convert Grobid curl command to requests in Python

Question

I'm trying to convert curl script to parse pdf file from grobid server to requests in Python.

Basically, if I run the grobid server as follows,

./gradlew run

I can use the following curl to get the output of parsed XML of an academic paper example.pdf as below

curl -v --form input=@example.pdf localhost:8070/api/processHeaderDocument

However, I don't know the way to convert this script into Python. Here is my attempt to use requests:

GROBID_URL = 'http://localhost:8070'
url = '%s/processHeaderDocument' % GROBID_URL
pdf = 'example.pdf'
xml = requests.post(url, files=[pdf]).text

titipata · Answer 1 · 2018-04-19T14:16:43.030

3

I got the answer. Basically, I missed api in the GROBID_URL and also the input files should be a dictionary instead of a list.

GROBID_URL = 'http://localhost:8070'
url = '%s/api/processHeaderDocument' % GROBID_URL
pdf = 'example.pdf'
xml = requests.post(url, files={'input': open(pdf, 'rb')}).text

edited Apr 19 '18 at 14:16

answered Apr 18 '18 at 20:36

titipata

5,321
3
35
59

score 0 · Answer 2 · answered Jan 10 '21 at 14:54

Here is an example bash script from http://ceur-ws.bitplan.com/index.php/Grobid. Please note that there is also a ready to use python client available. See https://github.com/kermitt2/grobid_client_python

#!/bin/bash
# WF 2020-08-04
# call grobid service with paper from ceur-ws
v=2644
p=44
vol=Vol-$v
pdf=paper$p.pdf
if [ ! -f $pdf ]
then
  wget http://ceur-ws.org/$vol/$pdf
else
  echo "paper $p from volume $v already downloaded" 
fi
curl -v --form input=@./$pdf http://grobid.bitplan.com/api/processFulltextDocument > $p.tei

Convert Grobid curl command to requests in Python

2 Answers2