0

I have dataframe which looks as given below :-

enter image description here

Dataframe:

            Coll1   Coll2   Coll3   Coll4   Coll5   Coll6   Coll7   Coll8
measure1    0.037966678 -0.135118575    -0.073656574    0.022888691 -0.571120494    -0.840920088    -0.042983197    -0.348949555
measure2    0.354188199 0.234036602 0.271199485 0.266918765 -0.18683292 0.031608422 0.206748811 0.12081408
measure3    0.037966427 -0.125931101    -0.073643686    0.022880467 -0.571035929    -0.840920088    -0.040196244    -0.31358209
measure4    6.62    0.07295635  0.000175013 0.00035944  0.00014809  0   0.069333663 0.112785347
measure5    0.354190545 0.251111058 0.271246949 0.267014706 -0.186860588    0.031608422 0.221083464 0.134440137
measure6    0.076594642 0.077704374 0.09952279  0.059278591 0.078890611 0.150241631 0.061460853 0.030369465
measure7    0.184133007 0.248415482 0.186416923 0.129443923 0.201084178 0.657964902 0.139587378 0.182577533

Description of Dataframe:-

  1. First coll is Index of dataframe

Expected Output from this dataframe is as given below:-

Output expected result

Logic behind calculation is given below:-

  • We have a Dictionary which will corelate Groups against Columns as given below
FLOW_GROUPS = {
    "Group1": [
        "Coll1",
        "Coll2",
        "Coll3",
    ],
    "Group2": [
        "Coll4",
        "Coll5",
    ],
    "Group3": [
        "Coll6",
        "Coll7",
        "Coll8",
    ]
    }
  • Calculation1: for measure1, measure2, measure4, measure5, measure6 & measure7 we need to do Aggregate(SUM)
  • Calculation2: for measure3 we need to calculate Aggregate(MEAN)

Solution: Implemented is given below :-

  • Step1: Convert Index to a column:
df_by_fulfillment_flow['index1'] = df_by_fulfillment_flow.index

Then Applying logic to get desired output:-

[pd.Series(df_by_fulfillment_flow[v].mean(axis=1), name=k) if df_by_fulfillment_flow['index1'] == 'measure3' else pd.Series(df_by_fulfillment_flow[v].sum(axis=1), name=k) for k, v in FLOW_GROUPS.items()]

Problem statement : But above code is failing with following error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Actual problem comes here df_by_fulfillment_flow['index1'] == 'measure3' Any input how to address this issue.

Vibhor Gupta
  • 670
  • 7
  • 16
  • `df_by_fulfillment_flow['index1'] == 'measure3'` returns a **Series** of boolean, not a single boolean, so you can't use it with `if` directly, You need to use vectorial selection. See the duplicate for a detailed explanation. – mozway Oct 18 '22 at 09:18
  • Thanks for your response. Can you please help me with ideal solution. – Vibhor Gupta Oct 18 '22 at 09:29
  • can you provide a fully reproducible minimal input (not images but DataFrame constructors)? – mozway Oct 18 '22 at 09:30
  • I have updated it, can you please check if you need any more info – Vibhor Gupta Oct 18 '22 at 09:56
  • @mozway How do you feel about info i have added over the post do you find it sufficient or you need any more information on it. – Vibhor Gupta Oct 18 '22 at 13:02

0 Answers0