8

What is the maximum column count of spark Dataframe? I tried getting it from data frame documentation but unable to find it.

Saran
  • 835
  • 3
  • 11
  • 31
  • Short answer is there is a limit- read [this answer](https://stackoverflow.com/a/51710233/5858851) for a more thorough explanation. – pault Aug 07 '18 at 14:23

1 Answers1

1

From the architectural perspective, they are scalable, so there should not be any limit on the column count, but it can give rise to uneven load on the nodes & may affect the overall performance of your transformations.

KiranM
  • 1,306
  • 1
  • 11
  • 20
  • 4
    It is not correct. You can easily find a hard limit (`Int.MaxValue`) but what is more important Spark scales well only long and relatively thin data. Fundamentally you cannot split a single record between executors / partitions. And there is a number of practical limitations (GC, disk IO) which make very wide data impractical. Not to mention some known bugs. – zero323 Sep 07 '16 at 19:37
  • For that matter, most (as far as I know) programming models scale "well" for long & thin data. ( Due to one basic reason, the record would be broken to write onto next relevant "logical unit" of storage after a threshold.) Most of the "big data" frameworks are designed to handle data that has no limits, if you overcome the technical limitations, with a performance hit though. So I think we would get memory errors before we reach the said limit. Your thoughts? – KiranM Sep 08 '16 at 16:52
  • 1
    This is an old entry but I concur with @zero323 on this. Big-data frameworks has the limitation mentioned in the comment above. These kind of framework don't work well with wide data. I've experimented that earlier but unfortunately I can't share that benchmark due to NDA. – eliasah Aug 06 '18 at 13:57