0

let's consider the following DF


import pandas as pd
import numpy as np
technologies = {
    'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"],
    'Courses':["Spark", "PySpark", "Pandas",  "Hadoop"],
    'Fee' :[25000, 20000, 22000, 25000]
              }
df = pd.DataFrame(technologies)
print(df)

and we want to split the column using apply() , the source said the fowlling

df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split("_")))

i don't understand how does pd.Series () return a DF while it's supposed to return a series. when i tried the fowling code

df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x)).str.split(",",expand=True))

the output will have

Columns must be same length as key

this because the if we used

df["Student_details"].apply(lambda x: pd.Series(str(x)).str.split("_",expand=True))

the out put will be this garbage

0                 0    1
0  Pramodh  Roy
1               0      1
0  Leena  Singh
2           0        1
0  James  William
3               0      1
0  Addem  Smith
Name: Student_details, dtype: object

,so my question

why do i need to use pd.Series() as in df["Student_details"].apply(lambda x: pd.Series(str(x).split("_"))) and why does it return DF instead of a series, I wanna understand I don't want to use it as it is. feel free to give an example if you want thanks in advance

i'm not asking about how to spilt i know how to spilt. i'm asking about the return of pd.Series()

  • 1
    `apply` will apply the lambda function to all the cells in your column, creating a Series of Series. You probably need `df[['First Name', 'Last Name']] = df["Student_details"].str.split('_', expand=True)` – Tranbi Jan 19 '23 at 08:54

0 Answers0