How to return a pandas dataframe instead of a series

Question

I have created a function to handle data processing such as filling null values but the result of the function is returning a series instead of giving me a dataframe. How do I solve this?

def preprocessing(df):
    df_columns = ['column1', 'column2','column3','column4', 'column5', 'column6','column7', 'column8']
    
    features= [c for c in df.columns.values if c in df_columns[0:2]]
    df = df[features].notna()
    
    features= [c for c in df.columns.values if c in df_columns[2:4]]
    max = df[features].max()
    df = df[features].fillna(max)
    
    # Fill na with 0
    features= [c for c in df.columns.values if c not in df_columns]
    df = df[features].fillna(0)
    
    return df

df = preprocessing(df) 

df.isnull().sum()

Does this answer your question? [Convert pandas Series to DataFrame](https://stackoverflow.com/questions/26097916/convert-pandas-series-to-dataframe) — Julien, Dec 11 '20 at 07:24

jezrael · Answer 1 · 2020-12-11T07:29:56.447

1

I think you need change like:

df = df[features].notna()

to:

df[features] = df[features].notna()

for processing only columns from list and also assign back for all your code.

It means:

def preprocessing(df):
    df_columns = ['column1', 'column2','column3','column4', 
                  'column5', 'column6','column7', 'column8']
    
    features= [c for c in df.columns.values if c in df_columns[0:2]]
    df[features] = df[features].notna()
    
    features= [c for c in df.columns.values if c in df_columns[2:4]]
    max1 = df[features].max()
    df[features] = df[features].fillna(max1)
    
    # Fill na with 0
    features= [c for c in df.columns.values if c not in df_columns]
    df[features] = df[features].fillna(0)
    
    return df

df = preprocessing(df) 

df.isnull().sum()

edited Dec 11 '20 at 07:29

answered Dec 11 '20 at 07:25

jezrael

822,522
95
1,334
1,252

Using ``df[features] = df[features].notna()`` still returned a series – Shadow Walker Dec 11 '20 at 07:28
@ShadowWalker - Do you use it for all code? Answer was edited what I think. – jezrael Dec 11 '20 at 07:30
I tried your code above but still returns a series. – Shadow Walker Dec 11 '20 at 07:33
@ShadowWalker - Do you think `print (df.isnull().sum())` what is expected `Series` or `print (df)` what is `DataFrame` ? – jezrael Dec 11 '20 at 07:34
1

My bad, I rerun the code with your changes and it worked. Thanks – Shadow Walker Dec 11 '20 at 07:35
There is this line ``df[features] = df[features].notna()`` which returns *bool* instead of returning a df where there are no null values for that feature. Any idea of a work around? – Shadow Walker Dec 11 '20 at 08:03
@ShadowWalker - Is possible add some sample data with 3-5 rows with expected output? Hrad to test without data – jezrael Dec 11 '20 at 08:04

score 0 · Answer 2 · answered Dec 10 '22 at 09:52

Returning as a data frame might work for your problem.

return pd.DataFrame(df)

for your example

import pandas as pd

def preprocessing(df):
df_columns = ['column1', 'column2','column3','column4', 
              'column5', 'column6','column7', 'column8']

features= [c for c in df.columns.values if c in df_columns[0:2]]
df[features] = df[features].notna()

features= [c for c in df.columns.values if c in df_columns[2:4]]
max1 = df[features].max()
df[features] = df[features].fillna(max1)

# Fill na with 0
features= [c for c in df.columns.values if c not in df_columns]
df[features] = df[features].fillna(0)

return pd.DataFrame(df)

How to return a pandas dataframe instead of a series

2 Answers2