-1

I have wrriten a simple application using flask. Its main objective is to implement CLD2 (language detector) using post and get methods. It is working well for English but for any other language such Urdu, Arabic. It gives invalid results

Following is the corresponding script

# http://127.0.0.1:5000/cld2?text="Your input text string"
# OUTPUT ( It gives output as we done in CC)
#"585&URDU-99-1155"


from flask import Flask,abort,jsonify,request
from flask_restful import Resource, Api, reqparse
import cld2
from bs4 import BeautiflSoup
import sys
import urllib2, urllib
import re

reload(sys)
sys.setdefaultencoding('utf8')


app = Flask(__name__)
api = Api(app)


class HelloWorld(Resource):

    def cld2_states(self, txt):
        txt = txt.encode("utf8")

        isReliable, textBytesFound, details = cld2.detect(txt)
        outstr = str(textBytesFound)
        for item in details:  # Iterate 3 languages
            if item[0] != "Unknown":
                outstr += '&' + item[0] + '-' + str(item[2]) + '-' + str(int(item[3]))
        return outstr

    def get(self):
        parser = reqparse.RequestParser()
        parser.add_argument('text', type=str)
        parser.add_argument('url', type=str)

        _dict =  dict(parser.parse_args())
        if _dict["text"] is not None:
            value = _dict["text"]
            print type(value)
            return self.cld2_states(value)

        return None

    def post(self):
        data = request.get_json(force=True)
        # print data
        predict_request = [data['content']][1]
        out = self.cld2_states(predict_request)

        return jsonify(score=out)

api.add_resource(HelloWorld, '/cld2')
if __name__ == '__main__':
    app.run(debug=True, port=6161, host='0.0.0.0')

If I give a query via get method, it give correct results but for same query in post method, it return just a number. But if text is in English then post also give correct result. My client is a simple Java application then iterate over files and find their language one by one.

Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121

1 Answers1

0

The problem might be with this line:

outstr = str(textBytesFound)

Instead of using str to convert from bytes to str, use str.decode(), like this:

outstr = textBytesFound.decode("utf-8")

(obviously if your text is not encoded with UTF-8, you need to tell Python the correct encoding to use)

ash
  • 5,139
  • 2
  • 27
  • 39