I have a dataframe made from a parquet file. I want to run df.select("firstName") and store that in a new dataframe but I want to keep track of what the column index of "firstName" was originally. Any ideas?
Asked
Active
Viewed 406 times
1
-
1There's no real easy way to do this. You can try to use ZipWithIndex, but that would require you to convert the dataframe into an rdd. See http://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex – Saif Charaniya Jul 21 '16 at 00:06
-
Yeah Ive been struggling to come up with an efficient solution. I'm trying to keep everything in dataframes – jojo Jul 21 '16 at 12:59
-
Column index or row index? Column index is easy, just look at the columns field of the original df, if you need row index it can be done as well, but for that I need to post an answer – Alessandro S. Feb 14 '17 at 07:59