I am trying to load a huge dataframe in to parts if the number of rows are greater than threshold. We have a threshold of 3 mil rows if the data frame has let's say 4 mil rows we want to load 3 mil rows first then load 1 mil in next loop. I am trying following approach : this is a pseudo code what I tried but these things are not working in scala I am looking for substitute in scala ot may be a better way of doing this
if(deltaExtractCount > 3000000)
{
length = len(df)
count = 0
while (count < length)
{
new_df = df[count : count + 3000000]
insert(new_df)
count = count + 3M
}
}
This is what I was trying but not successful. Haven't found equivalent scala function this pseudo code is more suitable for python . I am using spark 3.1.2 and scala 2.12 Let me know how I can achieve this split if there is other way