I am trying to apply a function to a column of a Pandas dataframe, the function returns a list of tuples. This is my function:
def myfunc(text):
values=[]
sections=api_call(text)
for (part1, part2, part3) in sections:
value=(part1, part2, part3)
values.append(value)
return values
For example,
sections=myfunc("History: Had a fever\n Allergies: No")
print(sections)
output:
[('past_medical_history', 'History:', 'History: Had a fever\n '), ('allergies', 'Allergies:', 'Allergies: No')]
For each tuple, I would like to create a new column. For example:
the original dataframe looks like this:
id text
0 History: Had a fever\n Allergies: No
1 text2
and after applying the function, I want the dataframe to look like this (where xxx is various text content):
id text part1 part2 part3
0 History: Had... past_... History: History: ...
0 Allergies: No allergies Allergies: Allergies: No
1 text2 xxx xxx xxx
1 text2 xxx xxx xxx
1 text2 xxx xxx xxx
...
I could loop through the dataframe and generate a new dataframe but it would be really slow. I tried following code but received a ValueError. Any suggestions?
df.apply(lambda x: pd.Series(myfunc(x['col']), index=['part1', 'part2', 'part3']), axis=1)
I did a little bit more research, so my question actually boils down to how to unnest a column with a list of tuples. I found the answer from this link Split a list of tuples in a column of dataframe to columns of a dataframe helps. And here is what I did
# step1: sectionizing
df["sections"] =df["text"].apply(myfunc)
# step2: unnest the sections
part1s = []
part2s = []
part3s = []
ids = []
def create_lists(row):
tuples = row['sections']
id = row['id']
for t in tuples:
part1s.append(t[0])
part2s.append(t[1])
part3s.append(t[2])
ids.append(id)
df.apply(create_lists, axis=1)
new_df = pd.DataFrame({"part1" :part1s, "part2": part2s, "part3": part3s,
"id": ids})[["part1", "part2", 'part3', "id"]]
But the performance is not so good. I wonder if there is better way.