0

The documentation for PySpark includes the following in an example:

from pyspark.context import SparkContext
from pyspark.sql.functions 
import *from pyspark.sql.types 
import *from datetime import date, timedelta, datetime

I don't recognize or understand the syntax of the last two lines. (Specifically: import *from.) Would someone kindly explain it to me, and point out where it is documented?

I know about . and .. in import paths ("relative import paths"), but this syntax is new to me and I can't find where it is documented or what it is called. I also notice that the third line contains from but no import and I don't understand that, either.

The web site where I found this is: https://towardsdatascience.com/pyspark-and-sparksql-basics-6cb4bf967e53 at the end of "Step One." Page written January 10, 2020.

(Topic title edited to indicate that we've concluded the web page is wrong. A comment below provides the location of the correct code, on GitHub.)

Mike Robinson
  • 8,490
  • 5
  • 28
  • 41
  • 1
    I doubt that it's valid, so it probably isn't documented. The import statement syntax is documented [here](https://docs.python.org/3/reference/simple_stmts.html#the-import-statement). – mkrieger1 Jun 15 '22 at 15:24
  • 2
    The second and last lines are incorrect. Either you import everything `*` or you specify what needs to be imported like in `from pyspark.context import SparkContext` –  Jun 15 '22 at 15:25
  • The web site in question is: https://towardsdatascience.com/pyspark-and-sparksql-basics-6cb4bf967e53 At the end of "Step One." It's certainly possible that this is some kind of typo or archaic syntax but it doesn't quite *smell* like one ... No, it's a recently-written web page. – Mike Robinson Jun 15 '22 at 15:26
  • 3
    @MikeRobinson I've read quite a few questionable things on that web site, to the point where I take anything they say with a grain of salt. – chepner Jun 15 '22 at 15:28
  • 3
    It looks like they've just added a newline at the wrong spot (at the end of line 2 and 3 - to be expected from that website. – TheFungusAmongUs Jun 15 '22 at 15:29
  • Yes, there is no line-continuation ... – Mike Robinson Jun 15 '22 at 15:29
  • Their real code is on Github... https://github.com/pinarersoy/PySpark_SparkSQL_MLib/blob/master/PySpark%20and%20SparkSQL.ipynb – OneCricketeer Jun 15 '22 at 15:41

4 Answers4

3

The line breaks in that code are in the wrong place. It should be

import pandas as pd
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql.types import *
from datetime import date, timedelta, datetime
import time
Barmar
  • 741,623
  • 53
  • 500
  • 612
1

The documentation for PySpark includes the following...

"Towards Data Science" / Medium is not the documentation for PySpark. This is - https://spark.apache.org/docs/latest/api/python/getting_started/

Notice that they never really use Spark functions outside of sc. (which itself is a confusing name because commonly people use this name to set for a SparkContext instance, not a SparkSession ... )

In any case, the only import they actually use is from pyspark.sql import SparkSession

But the literal text including the line breaks would look like

from pyspark.context import SparkContext\n
from pyspark.sql.functions
import *\nfrom pyspark.sql.types 
import *\nfrom datetime import date, timedelta, datetime

Or, in proper Python formatting

from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql.types import *
from datetime import date, timedelta, datetime

So, the author just doesn't know how to format in Medium documents, it seems...

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
0

Looks like there are some misplaced line breaks, it should be:

from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql.types import *
from datetime import date, timedelta, datetime
-2

From what I've understood, the lines:

import *from pyspark.sql.types 
import *from datetime import date, timedelta, datetime

are equivalent to the lines:

from pyspark.sql.types import *
from datetime import date, timedelta, datetime import *

There might be some convention for which ones to use in which cases that I'm not aware of, but they serve the same purpose

  • 2
    The [syntax](https://docs.python.org/3/reference/simple_stmts.html#the-import-statement) doesn't seem to allow multiple `import` in the same `from module` like your last line. – Barmar Jun 15 '22 at 15:29
  • 1
    I'm willing for the consensus of the group to be that the content of this web site page is simply *wrong.* ... Is that the consensus of the group? – Mike Robinson Jun 15 '22 at 15:31
  • It is objectively wrong. Whether it's due to ignorance or lazy copyediting is unclear but also irrelevant. (Probably lazy editing, as there is no way anyone could have tried to run that code and had it work.) – chepner Jun 15 '22 at 15:33