28

I am trying to use fastparquet to open a file, but I get the error:

RuntimeError: Decompression 'SNAPPY' not available.  Options: ['GZIP', 'UNCOMPRESSED']

I have the following installed and have rebooted my interpreter:

python                    3.6.5                hc3d631a_2  
python-snappy             0.5.2                    py36_0    conda-forge
snappy                    1.1.7                hbae5bb6_3  
fastparquet               0.1.5                    py36_0    conda-forge

Everything downloaded smoothly. I didn't know if I needed snappy or python-snappy so I got one had no fix and got the other, still with no success. All related issues I have found are fixed when downloading snappy, but I am still getting this error with having two snappys! Any help would be appreciated.

B. Sharp
  • 281
  • 1
  • 3
  • 6
  • Any update on this? – TwinPenguins Jul 05 '18 at 11:14
  • 4
    I ended up using pyspark to read my files because I never got a response. I am unsure how to fix this, but my project has since moved forward. – B. Sharp Jul 09 '18 at 15:37
  • Didn't work for me either, even with pyspark installed as suggested by @Catbuilts. I circumvented the issue by using GZIP compression to save the Parquet file, then switching to pyarrow engine as that was far faster. – bugfoot Jan 08 '20 at 13:04
  • ```conda install -c conda-forge python-snappy fastparquet snappy``` worked for me. Installing those from conda base channel did not work somehow. – Matthew Son Feb 12 '20 at 03:07
  • Hi just wondering how did you setup pyspark and get the result for this problem? I got the same error when using pandas. – wawawa Dec 02 '20 at 10:40

3 Answers3

29

Run:

pip install python-snappy
pip install pyarrow 

It should do the trick.

I think you lack the pyarrow package.

If you have an error with pip, use conda instead (i.e., conda install python-snappy or if you still have errors conda install -c conda-forge python-snappy).

Alex Stephens
  • 3,017
  • 1
  • 36
  • 41
Chau Pham
  • 4,705
  • 1
  • 35
  • 30
  • 8
    Installing pyarrow is irrelevant. ```conda install -c conda-forge python-snappy fastparquet snappy``` worked for me. Installing those from base channel did not work somehow. – Matthew Son Feb 12 '20 at 03:07
  • 2
    ^ **this** is the solution here; you need both python-snappy (the wrapper) and snappy (the C lib) from the same channel – mdurant Jun 01 '20 at 14:12
13

You need to install python-snappy as stated by the response of Catbuilts. However, it is only a wrapper around the snappy implementation in c that should be installed in your computer, this issue has been addressed in this answer about installing snappy-c.

Assuming you have a DEB-based system, such as ubuntu, you can get it with:

sudo apt-get install libsnappy-dev
python3 -m pip install --user python-snappy

To test it, you can try the following script:

import pandas as pd
import snappy  # Not required but snappy (python-snappy) module should be reachable
from fastparquet import write, ParquetFile
df = pd.DataFrame({"col1": [1,2,3,4], "col2": ["a","b","c","d"]})
# df.head() # Test your initial value
write("/tmp/deleteme", df, compression="SNAPPY")
df_parquet = ParquetFile("/tmp/deleteme").to_pandas()
df_parquet.head()
MarcosBernal
  • 562
  • 5
  • 13
0

The following installations are pretty helpful

pip install fastparquet

pip install python-snappy

pip install pyarrow
Paul Roub
  • 36,322
  • 27
  • 84
  • 93