I am attempting to break out a DataFrame column into several varying ones based upon a function that parses the original column contents. They contain something that my function can turn into a dataframe with varying column names. All columns need to be added to the end of the existing dataframe. The columns should not duplicate in name. The below is a simplified version of what I'm trying to do. It errors out.
EDIT: One point of clarification, please disregard the fact that I have used a dict to form sub_transaction. The sub_transaction column in actuality has a lengthy XML in it that is turned into a DataFrame by parse_subtransaction. The simpler dict was just for example purposes. The important point is that a function must be used to parse it and that function returns a DataFrame.
original dataframe
transaction_id sub_transaction
abc1 {'id': 'abc1x', 'total': 10}
abc2 {'id': 'abc2x', 'total': 20}
abc3 {'id': 'abc3x', 'total': 30}
abc4 {}
abc5 {'id': 'abc5x'}
desired dataframe outcome
transaction_id sub_transaction_id total
abc1 abc1x 10
abc2 abc2x 20
abc3 abc3x 30
import pandas as pd
def parse_subtransaction(sub_transaction):
return pd.DataFrame({
'sub_transaction_id': [sub_transaction.get('id')],
'total': [sub_transaction.get('total')]})
def main():
df = pd.DataFrame({
'transaction_id': ['abc1', 'abc2', 'abc3','abc4','abc5'],
'sub_transaction': [
{'id': 'abc1x', 'total': 10},
{'id': 'abc2x', 'total': 20},
{'id': 'abc3x', 'total': 30},
{},
{'id':'abc5x'}]
})
applied_df = df.apply(
lambda row: parse_subtransaction(row['sub_transaction']),
axis='columns',
result_type='expand')
# ERROR: ValueError: If using all scalar values, you must pass an index
if (__name__ == "__main__"):
main()