Is there a way to append a dataframe
horizontally to another one - assuming both have identical number of rows?
This would be the equivalent of pandas
concat
by axis=1
;
result = pd.concat([df1, df4], axis=1)
or the R
cbind
Is there a way to append a dataframe
horizontally to another one - assuming both have identical number of rows?
This would be the equivalent of pandas
concat
by axis=1
;
result = pd.concat([df1, df4], axis=1)
or the R
cbind
There wouldn't be one. Unlike Pandas DataFrame
, Spark DataFrame
is more a relation, and has no inherent order.
There is known pattern, where you convert data to RDD
, zipWithIndex
(PySpark DataFrames - way to enumerate without converting to Pandas?) and then join
using index field, but it is ultimately an antipattern*.
* If we don't explicitly guarantee specific order (and who know what happens under the hood with all new bells and whistles like cost based optimizer and custom optimizer rules) then it can easily become brittle and silently fail in some unexpected way.