What is the maximum column count of spark Dataframe? I tried getting it from data frame documentation but unable to find it.
Asked
Active
Viewed 4,253 times
8
-
Short answer is there is a limit- read [this answer](https://stackoverflow.com/a/51710233/5858851) for a more thorough explanation. – pault Aug 07 '18 at 14:23
1 Answers
1
From the architectural perspective, they are scalable, so there should not be any limit on the column count, but it can give rise to uneven load on the nodes & may affect the overall performance of your transformations.

KiranM
- 1,306
- 1
- 11
- 20
-
4It is not correct. You can easily find a hard limit (`Int.MaxValue`) but what is more important Spark scales well only long and relatively thin data. Fundamentally you cannot split a single record between executors / partitions. And there is a number of practical limitations (GC, disk IO) which make very wide data impractical. Not to mention some known bugs. – zero323 Sep 07 '16 at 19:37
-
For that matter, most (as far as I know) programming models scale "well" for long & thin data. ( Due to one basic reason, the record would be broken to write onto next relevant "logical unit" of storage after a threshold.) Most of the "big data" frameworks are designed to handle data that has no limits, if you overcome the technical limitations, with a performance hit though. So I think we would get memory errors before we reach the said limit. Your thoughts? – KiranM Sep 08 '16 at 16:52
-
1This is an old entry but I concur with @zero323 on this. Big-data frameworks has the limitation mentioned in the comment above. These kind of framework don't work well with wide data. I've experimented that earlier but unfortunately I can't share that benchmark due to NDA. – eliasah Aug 06 '18 at 13:57