0

I have a data frame in pandas and I have pivoted on a given element in this case name. Each pivotted dataframe is indexed by date, has a "Key" column and 2 other columns F_1, F_2. What I want is to find all elements with a given key and sum all elements in F_1 and F_2 together by date. I have a list_names which stores all of the unique names. Here is what my data frames look like

enter image description here

the above is stored in the names dataframe. What I want is if the Key is NY, to sum F_1/F_2 for Alfred, Claire and Dane to get the below output

enter image description here

My code is below, but my actual output ends up being:

enter image description here

    key='NY'
    sub_names=[]
    for s in list_names: #list_names is the full list of actual names
        for x in names[s]['Key']:
            if key ==x;
                sub_names.append(s)
                #Note I tried to combine the above for/if statement as "if key in names[s]['Key']"
                #but it kept having a run time error
    sub_frame=pd.DataFrame[columns = ['date','F_1','F_2','Key']]
    sub_frame.set_index(['date'])
    counter = 0
    for z in sub_names:
        if counter ==0:
            sub_frame['F_1']=names[z]['F_1']
            sub_frame['F_2']=names[z]['F_2']
            counter=counter+1
        else:
            sub_frame['F_1']=sub_frame['F_1']+names[z]['F_1']
            sub_frame['F_2']=sub_frame['F_2']+names[z]['F_2'] 

    sub_frame['Key']=key
    display(sub_frame)           

Which gives me the above with my final output has NAN as 2 entries. I could add extra if statements that excludes NAN values, but I have to assume there has to be a better way. Any help would be appreciated.

user3782816
  • 131
  • 1
  • 1
  • 8
  • does this answer your question? https://stackoverflow.com/questions/51635290/pandas-combine-two-columns/51635317#51635317 – SuperStew Jun 24 '20 at 20:50

1 Answers1

1

You can use the built-in pandas dataframe capabilities to achieve this.

Assuming your data is on list_namesas a list of rows in this content order: Date, F_1, F-2, Key

import pandas as pd

df = pd.DataFrame(list_names,columns=["Date","F_1","F_2","Key"])

# Filter by key = "NY"
df = df.loc[df["Key"] == "NY",:]

# Now group by date column, it will automatically sum F_1 and F_2 columns
df = df.groupby("Date").sum()

print(df)

This will get you the desired output

Code Ninja 2C4U
  • 114
  • 1
  • 11