Concatenate string and list to create a list of paths

Asked Feb 07 '22 at 23:44

Active Feb 07 '22 at 23:46

Viewed 30 times

I need to read a bunch of csv files in pyspark in databricks.

spark.read.option("header", True).format("csv").load('path/2021', 'path/2020')

Is it possible to make it dynamic with some variables like this?

years_to_query = ['2020', '2021']

path = 'path'

I'm trying to concatenate but I'm not getting the desired result.

edited Feb 07 '22 at 23:46

mkrieger1

asked Feb 07 '22 at 23:44

JJ1603

Concatenation should work easily here. I don't know what you are doing wrong. – mkrieger1 Feb 07 '22 at 23:46
Generally speaking, you shouldn't use string concatenation to join file-system path components, use `os.path.join()`, which isn't OS-specific. – martineau Feb 08 '22 at 00:31
You don't need `zip()` or `Itertools.repeat()` here as @azelcer suggested — use `load(*(os.path.join(path, year) for year in years_to_query))` instead which makes use of a [generator expression](https://docs.python.org/3/reference/expressions.html#generator-expressions) to create the arguments. – martineau Feb 08 '22 at 00:40
You could also use the [`pathlib`](https://docs.python.org/3/library/pathlib.html#module-pathlib) module and do it like this: `load(*((Path(path)/year) for year in years_to_query))`. – martineau Feb 08 '22 at 00:48

0 Answers0