I have developed a Scraping script to get information from a web in Mule ESB, but I have problems importing Pandas, Numpy and BeautifulSoup. How can I include these libraries correctly?
First of all, to use external Python libraries you have to include them in the Mule ESB path variable in its execution:
-Dpython.path=./lib/libPy/pymongo-3.9.0;./lib/libPy/numpy-1.17.3;./lib/libPy/pandas-0.25.2
My Python script is located at the end of this POST and works in Jupyter Notebook, although to extract the error I used the following one:
import numpy as np
import pandas as pd
print("Everything done correctly!")
The error that Mule ESB gives me is the following:
ERROR 2019-10-30 10:31:50,171 [[pruebascraping].pruebascrapingFlow.stage1.02] org.mule.exception.DefaultMessagingExceptionStrategy:
********************************************************************************
Message : ImportError: Error importing numpy: you should not try to import numpy from
its source directory; please exit the numpy source tree, and relaunch
your python interpreter from there. in <script> at line number 1 (javax.script.ScriptException)
Payload : foo
Transformer : ScriptTransformer{this=9880a15, name='ScriptTransformer', ignoreBadInput=false, returnClass=SimpleDataType{type=java.lang.Object, mimeType='*/*', encoding='null'}, sourceTypes=[]}
Payload Type : java.lang.String
Element : /pruebascrapingFlow/processors/0 @ pruebascraping:pruebascraping.xml:22 (Python)
Element XML : <scripting:transformer doc:name="Python">
<scripting:script engine="jython">import numpy as np
import pandas as pd
print("Everything done correctly!")</scripting:script>
</scripting:transformer>
--------------------------------------------------------------------------------
Root Exception stack trace:
Traceback (most recent call last):
File "<script>", line 1, in <module>
File "C:\Users\enriquebs\AnypointStudio\workspace\pruebascraping\lib\libPy\numpy-1.17.3\numpy\__init__.py", line 131, in <module>
raise ImportError(msg)
ImportError: Error importing numpy: you should not try to import numpy from
its source directory; please exit the numpy source tree, and relaunch
your python interpreter from there.
I don't know exactly what to do, there are libraries that work but others that don't and I would like to integrate Machine Learning using Scikit-Learn. However, if I can't use Numpy and Pandas it's getting difficult and I can't find any reference to use these tools in Mule ESB.
import pandas as pd
import requests
import time
from bs4 import BeautifulSoup
from pymongo import MongoClient
url = "http://pagina.jccm.es/medioambiente/rvca/Dest/Cuenca.htm"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html')
tablaDatos = soup.find ("table")
client = MongoClient('pluton.i3a.uclm.es',username='root',password='patata',authSource='admin',authMechanism='SCRAM-SHA-1')
db = client.kikeDevVieja
dataB = pd.read_html(str(tablaDatos))
camposEvento = ["so2","date","no2C","timestampSensor","o3","pm10","pressure","co","batteryVolts","no2",
"idStation","serial","pm1","coC","pm2_5","temperature","humidity","luminosity","o3C",
"batteryCurrent", "timestamp", "batteryLevel"]
eventoSimple = {"idStation" : "Universidad",
"serial" : "NOSERIALID",
"humidity" : 0.0,
"luminosity" : 0.0,
"pm10" : 0.0,
"batteryLevel" : 100,
"co" : 0.0,
"coC" : 0.0,
"pressure" : 0.0,
"no2C" : 0.0,
"batteryVolts" : 0.0,
"timestamp" : int(time.time())*1000,
"batteryCurrent" : 0,
"pm1" : 0.0,
"o3C" : 0.0,
"temperature" : 0.0}
scrapeado = {dataB[3][0][ind][dataB[3][0][ind].find("(")+1:dataB[3][0][ind].find(")")].lower().replace(",", "_") : float(dataB[3][1][ind].replace(",",".")[:-6])
for ind, x in enumerate(dataB[3][0])
if dataB[3][0][ind][dataB[3][0][ind].find("(")+1:dataB[3][0][ind].find(")")].lower().replace(",", "_") in camposEvento}
eventoSimple.update(scrapeado)
eventoSimple