0

I have developed a Scraping script to get information from a web in Mule ESB, but I have problems importing Pandas, Numpy and BeautifulSoup. How can I include these libraries correctly?

First of all, to use external Python libraries you have to include them in the Mule ESB path variable in its execution:

-Dpython.path=./lib/libPy/pymongo-3.9.0;./lib/libPy/numpy-1.17.3;./lib/libPy/pandas-0.25.2

My Python script is located at the end of this POST and works in Jupyter Notebook, although to extract the error I used the following one:

import numpy as np
import pandas as pd

print("Everything done correctly!")

The error that Mule ESB gives me is the following:

ERROR 2019-10-30 10:31:50,171 [[pruebascraping].pruebascrapingFlow.stage1.02] org.mule.exception.DefaultMessagingExceptionStrategy: 
********************************************************************************
Message               : ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there. in <script> at line number 1 (javax.script.ScriptException)
Payload               : foo
Transformer           : ScriptTransformer{this=9880a15, name='ScriptTransformer', ignoreBadInput=false, returnClass=SimpleDataType{type=java.lang.Object, mimeType='*/*', encoding='null'}, sourceTypes=[]}
Payload Type          : java.lang.String
Element               : /pruebascrapingFlow/processors/0 @ pruebascraping:pruebascraping.xml:22 (Python)
Element XML           : <scripting:transformer doc:name="Python">
                        <scripting:script engine="jython">import numpy as np
import pandas as pd

print("Everything done correctly!")</scripting:script>
                        </scripting:transformer>
--------------------------------------------------------------------------------
Root Exception stack trace:
Traceback (most recent call last):
  File "<script>", line 1, in <module>
  File "C:\Users\enriquebs\AnypointStudio\workspace\pruebascraping\lib\libPy\numpy-1.17.3\numpy\__init__.py", line 131, in <module>
    raise ImportError(msg)
ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there.

I don't know exactly what to do, there are libraries that work but others that don't and I would like to integrate Machine Learning using Scikit-Learn. However, if I can't use Numpy and Pandas it's getting difficult and I can't find any reference to use these tools in Mule ESB.

import pandas as pd
import requests
import time
from bs4 import BeautifulSoup
from pymongo import MongoClient

url = "http://pagina.jccm.es/medioambiente/rvca/Dest/Cuenca.htm"
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html')
tablaDatos = soup.find ("table")

client = MongoClient('pluton.i3a.uclm.es',username='root',password='patata',authSource='admin',authMechanism='SCRAM-SHA-1')
db = client.kikeDevVieja

dataB = pd.read_html(str(tablaDatos))

camposEvento = ["so2","date","no2C","timestampSensor","o3","pm10","pressure","co","batteryVolts","no2",
                                      "idStation","serial","pm1","coC","pm2_5","temperature","humidity","luminosity","o3C",
                                      "batteryCurrent", "timestamp", "batteryLevel"]
eventoSimple = {"idStation" : "Universidad",
        "serial" : "NOSERIALID",
        "humidity" : 0.0,
         "luminosity" : 0.0,
         "pm10" : 0.0,
         "batteryLevel" : 100,
         "co" : 0.0,
         "coC" : 0.0,
         "pressure" : 0.0,
         "no2C" : 0.0,
         "batteryVolts" : 0.0,
         "timestamp" : int(time.time())*1000,
         "batteryCurrent" : 0,
         "pm1" : 0.0,
         "o3C" : 0.0,
         "temperature" : 0.0}

scrapeado = {dataB[3][0][ind][dataB[3][0][ind].find("(")+1:dataB[3][0][ind].find(")")].lower().replace(",", "_") : float(dataB[3][1][ind].replace(",",".")[:-6])
              for ind, x in enumerate(dataB[3][0])
               if dataB[3][0][ind][dataB[3][0][ind].find("(")+1:dataB[3][0][ind].find(")")].lower().replace(",", "_") in camposEvento}
eventoSimple.update(scrapeado)
eventoSimple
  • NumPy and Pandas do not work with Jython. See https://scipy.org/scipylib/faq.html#does-numpy-scipy-work-with-jython-or-c-net, https://stackoverflow.com/q/19455100/407651, https://stackoverflow.com/q/36213908/407651 – mzjn Oct 30 '19 at 13:43
  • You should mention your Mule version. From the error message I gather it is a Mule 3.x version. In any case Mule uses Jython and @nzjn mentioned that NumPy doesn't work with Jython. – aled Oct 30 '19 at 17:39

0 Answers0