I am currently trying to open parquet files using Azure Jupyter Notebooks. I have tried both Python kernels (2 and 3). After the installation of pyarrow I can import the module only if the Python kernel is 2 (not working with Python 3)
Here is what I've done so far (for clarity, I am not mentioning all my various attempts, such as using conda instead of pip, as it also failed):
!pip install --upgrade pip
!pip install -I Cython==0.28.5
!pip install pyarrow
import pandas
import pyarrow
import pyarrow.parquet
#so far, so good
filePath_parquet = "foo.parquet"
table_parquet_raw = pandas.read_parquet(filePath_parquet, engine='pyarrow')
This works well if I'm doing that off-line (using Spyder, Python v.3.7.0). But it fails using an Azure Notebook.
AttributeErrorTraceback (most recent call last)
<ipython-input-54-2739da3f2d20> in <module>()
6
7 #table_parquet_raw = pd.read_parquet(filePath_parquet, engine='pyarrow')
----> 8 table_parquet_raw = pandas.read_parquet(filePath_parquet, engine='pyarrow')
AttributeError: 'module' object has no attribute 'read_parquet'
Any idea please?
Thank you in advance !
EDIT:
Thank you very much for your reply Peter Pan ! I have typed these statements, here is what I got:
1.
print(pandas.__dict__)
=> read_parquet does not appear
2.
print(pandas.__file__)
=> I get:
/home/nbuser/anaconda3_23/lib/python3.4/site-packages/pandas/__init__.py
import sys; print(sys.path) => I get:
['', '/home/nbuser/anaconda3_23/lib/python34.zip', '/home/nbuser/anaconda3_23/lib/python3.4', '/home/nbuser/anaconda3_23/lib/python3.4/plat-linux', '/home/nbuser/anaconda3_23/lib/python3.4/lib-dynload', '/home/nbuser/.local/lib/python3.4/site-packages', '/home/nbuser/anaconda3_23/lib/python3.4/site-packages', '/home/nbuser/anaconda3_23/lib/python3.4/site-packages/Sphinx-1.3.1-py3.4.egg', '/home/nbuser/anaconda3_23/lib/python3.4/site-packages/setuptools-27.2.0-py3.4.egg', '/home/nbuser/anaconda3_23/lib/python3.4/site-packages/IPython/extensions', '/home/nbuser/.ipython']
Do you have any idea please ?
EDIT 2:
Dear @PeterPan, I have typed both !conda update conda
and !conda update pandas
: when checking the Pandas version (pandas.__version__
), it is still 0.19.2
.
I have also tried with !conda update pandas -y -f
, it returns:
`Fetching package metadata ...........
Solving package specifications: .
Package plan for installation in environment /home/nbuser/anaconda3_23:
The following NEW packages will be INSTALLED:
pandas: 0.19.2-np111py34_1`
When typing:
!pip install --upgrade pandas
I get:
Requirement already up-to-date: pandas in /home/nbuser/anaconda3_23/lib/python3.4/site-packages
Requirement already up-to-date: pytz>=2011k in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: numpy>=1.9.0 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: python-dateutil>=2 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: six>=1.5 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from python-dateutil>=2->pandas)
Finally, when typing:
!pip install --upgrade pandas==0.24.0
I get:
Collecting pandas==0.24.0
Could not find a version that satisfies the requirement pandas==0.24.0 (from versions: 0.1, 0.2b0, 0.2b1, 0.2, 0.3.0b0, 0.3.0b2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0rc1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0rc1, 0.8.0rc2, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0rc1, 0.19.0, 0.19.1, 0.19.2, 0.20.0rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0rc1, 0.21.0, 0.21.1, 0.22.0)
No matching distribution found for pandas==0.24.0
Therefore, my guess is that the problem comes from the way the packages are managed in Azure. Updating a package (here Pandas), should lead to an update to the latest version available, shouldn't it?