4

This has a different answer to those given in the post above

I am getting an error that reads

pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'

when I try to read in a parquet file like such using Spark 2.1.0

data = spark.read.parquet('/myhdfs/location/')

I have checked and the file/table is not empty by looking at the impala table through the Hue WebPortal. Also, other files that I have stored in similar directories read absolutely fine. For the record, the file names contain hyphens but no underscores or full-stops/periods.

Hence, none of the answers in the following post apply Unable to infer schema when loading Parquet file

Any ideas?

Taylrl
  • 3,601
  • 6
  • 33
  • 44
  • Have you checked the answers on this post first: https://stackoverflow.com/questions/44954892/unable-to-infer-schema-when-loading-parquet-file – ash_huddles Nov 02 '18 at 18:01
  • Possible duplicate of [Unable to infer schema when loading Parquet file](https://stackoverflow.com/questions/44954892/unable-to-infer-schema-when-loading-parquet-file) – 10465355 Nov 02 '18 at 18:47
  • Yeap. I’ve read that and none of the answers apply. – Taylrl Nov 03 '18 at 01:00
  • 1
    Try reading an individual Parquet file by providing its full path and report the outcome. – Sim Nov 03 '18 at 23:52
  • Ah hah! It turns out there was another level in the directory structure! – Taylrl Nov 06 '18 at 11:19

2 Answers2

5

It turns out I was getting this error because there was another level to the directory structure. The following was what I needed;

data = spark.read.parquet('/myhdfs/location/anotherlevel/')
Taylrl
  • 3,601
  • 6
  • 33
  • 44
0

I got the same problem but none of the answers I found online worked for me. It turns out that I was writing the code in this way:

data = spark.read.parquet("/myhdfs/location/anotherlevel/")

so, using double " . When I switched to using single ' , my problem was solved.

data = spark.read.parquet('/myhdfs/location/anotherlevel/')

Sharing in case it helps anybody

  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/31376866) – tjheslin1 Mar 29 '22 at 05:56