1

I have files in data bricks as shown below enter image description here

I am trying to access them like this from data bricks notebooks enter image description here

But I am getting error, even trying to use pandas gives an error enter image description here

I don't understand where am I going wrong. Althought dbutils.fs.head('/FileStore/tables/flights_small.csv') gives me result correctly.

1 Answers1

2

You are using DataBricks Community Edition, because of a quirk with DBR >= 7.0, you cannot read in from your path.

I usually just have a command like the new one below to resolve this issue and programmatically bring te file to the accessible temp folder:

%fs cp /FileStore/tables/flights_small.csv file:/tmp/flights_small.csv

then simply read it in:

pd.read_csv('file:/tmp/flights_small.csv')

Given quirks like this in the Community Edition (long cluster startup times), I usually go with Google Colab for hobby work in a browser notebook.

Also you can run pyspark on Google Colab with just

!pip install pyspark

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .master("local")\
        .appName("Colab")\
        .config('spark.ui.port', '4050')\
        .getOrCreate()
noahtf13
  • 333
  • 1
  • 7
  • 1
    Oh I see, actually, I am exploring great expectations with spark, don't want to insrtall spark cluster. One clarification I want, what is the difference between `dbfs:` and `file:`? – Probhakar Sarkar Aug 30 '21 at 13:41
  • 1
    https://stackoverflow.com/questions/63667523/databricks-difference-between-dbfs-vs-file is a great answer. Source: Google `dbfs vs file databricks` – noahtf13 Aug 30 '21 at 15:21