1

When I'm building out dataframes inside a loop, I often find myself using this convention:

complete_df = None
for data_chunk in data_chunks:
    partial_df = get_partial_df(data_chunk)    
    partial_df = do_some_stuff_to_my_df(partial_df)
    if complete_df is None:
        complete_df = partial_df
    else:
        complete_df = complete_df.append(partial_df)

I'm looking for a better / shorter / more pythonic way to do this. A ternary statement seems like it wouldn't be an improvement.

Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
red
  • 684
  • 1
  • 6
  • 10

3 Answers3

1

You can do away with the if else block if you initialize the complete_df to an empty DataFrame like this:

import pandas as pd

complete_df = pd.DataFrame()
for data_chunk in data_chunks:
    partial_df = get_partial_df(data_chunk)    
    partial_df = do_some_stuff_to_my_df(partial_df)
    complete_df = complete_df.append(partial_df)
WebDev
  • 1,211
  • 11
  • 17
0

try this

complete_df = None
for data_chunk in data_chunks:
    partial_df = get_partial_df(data_chunk)    
    complete_df = partial_df if complete_df is None else complete_df.append(partial_df)
Jitesh Prajapati
  • 2,533
  • 4
  • 29
  • 51
0
data_chunks = range(1, 100, 4)

def get_partial_df(num):
    return num

#complete_df = None
complete_df = list()
print(type(complete_df))
for data_chunk in data_chunks:
    partial_df = get_partial_df(data_chunk)
    complete_df.append(partial_df)

    # if complete_df is None:
    #     complete_df = partial_df  ##here complete_df is int
    # else:
    #    complete_df = complete_df.append(partial_df) ## appending to be done on list/tuple.
Kaushik
  • 95
  • 1
  • 8