I am reading data from a hive table and creating DataFrame in pyspark using:
hive_df = sqlContext.sql("select * from table")
The DataFrame hive_df
has three columns: (cust_id, name, l_name)
In hive table
the cust_id
field is null
for all the records, so I want to put some value, in incremental manner.
Data in hive table
cust_id,name,l_name
, abc, def
, ghi, jkl
, mno, pqr
Desired Output
cust_id,name,l_name
1000, abc, def
1001, ghi, jkl
1002, mno, pqr