1

I am having trouble adding several dataframes in a list of dataframes. My goal is to add dataframes from a list of dataframes based on the criteria from another list.

Example: Suppose we have a list of 10 Dataframes, DfList and another list called OrderList.

Suppose OrderList = [3, 2, 1, 4].

Then I would like to obtain a new list of 4 Dataframes in the form [DfList(0) + DfList(1) + DfList(2), DfList(3) + DfList(4), DfList(5), DfList(6) + DfList(7) + DfList(8) + DfList(9)]

I have tried a few ways to do this creating functions using DataFrame.add. Initially, my hope was that I could use the form sum(DfList(0), DfList(1), DfList(2)) to do this but quickly learned that sum() doesn't seem to be supported with DataFrames.

I was hoping to use something like sum(DfList[0:2]) and making OrderList cumulative so I could just use sum(DfList[OrderList[i]:OrderList[i+1]]) but keep getting unsupported operand type errors.

Is there an easy way to do this that I am not considering or is there a different approach entirely that you would suggest?

EDIT: The output I am looking for is another list of DataFrames containing four summed DataFrames based on OrderList (across all columns.) Three DataFrames added together for the first, two for the second, one for the third, and four for the fourth.

  • Hello there, and welcome to StackOverflow! I got a little confused with the question. First, you have a list of DataFrames, right? Do you want to create a new list of DataFrames or create a new DataFrame that is the sum of others? Second, what role does `OrderList` have in this? – araraonline May 20 '19 at 23:19
  • I apologize; I should have worded it better. I am looking for new list of DataFrames containing the sums of smaller lists defined by the list, OrderList (I should have used a better name.) So for the first DataFrame, I am looking for a sum of all the columns of the first three, second DataFrame the sum of all the columns of the next two, third DataFrame the sum of all the columns of the next one, and a fourth DataFrame with the sum of all the columns of the last four. – Zohaib Syed May 21 '19 at 00:40
  • There are two things you are asking here... First, how to use `OrderList=[3, 2, ...]` to sum the first three elements, then the next two, etc. Next, you are asking how to add different numbers of DataFrames. The second question the guy below already answered :) For the first, you should start with something simpler, like summing `[1, 2, 3, 4, 5]` with order `[2, 3]` to give `[3, 12]`. This way you don't have to worry about data structures whatsoever, just the language. Sorry I don't have the time to come with an answer for you, but it shouldn't be hard, just do your best :) – araraonline May 22 '19 at 00:30

1 Answers1

1

If you have a list of DataFrames as you said, you can use the operation sum(DfList[0:2]), but you need to be careful with the order of the columns in each DataFrame in your list because the order provided is used when adding the DataFrames. The addition does not occur accordingly to the names of the columns. If you need, the order of the columns can be changed as showed in this other question.

This example illustrates the issue:

import pandas as pd

df1 = pd.DataFrame({1:[1,23,4], 2:['x','y','z']})
df2 = pd.DataFrame({2:['x','y','z'], 1:[1,23,4]})

try:
    df1 + df2
except TypeError:
    print("Error")

df1 = pd.DataFrame({1:[1,23,4], 2:['x','y','z']})
df2 = pd.DataFrame({1:[1,23,4], 2:['x','y','z']})

#works fine
df1 + df2

Also, the logic that you used for the cumulative sum in sum(DfList[OrderList[i]:OrderList[i+1]])is not correct. For this to be the case, the OrderList would also need to be cumulative and have one extra element to start from zero, so instead of OrderList = [3, 2, 1, 4], you would have OrderList = [0, 3, 5, 6, 10].

  • I apologize, I should have explained the issue better. I had added the 0 to the list in my attempt as well as used `np.cumsum` but the issue I was having was more due to the addition of DataFrames themselves. Spelling out df1+df2+df3+df4 is the problem since the addition itself is to be done on different numbers of DataFrames. df1 + df2 + df3 is working for me but the number of dfs being added need to be changed based on `OrderList`. That was the reason I was erring towards the side of `sum(DfList[OrderList[i]:OrderList[i+1]])`. While df1 + df2 +df3 is working for me, sum(df1, df2, df3) isn't. – Zohaib Syed May 21 '19 at 01:15
  • In order for it to work with the `sum` function, you just need to wrap the DataFrames in a list, so instead of `sum(df1, df2, df3)`, you should use `sum([df1, df2, df3])`. – Pedro Igor A. Oliveira May 21 '19 at 01:26
  • I still seem to be getting the `unsupported operand type(s) for +: 'int' and 'list'` error. The snippet I am using thats giving me this is `for i in range(len(OrderList)): sum([DfList[OrderList[i]:OrderList[i+1]]])`. I also tried `for i in OrderList: sum([DfList[i:i+1]])`. Note that I have the added 0 in the front of `np.cumsum(OrderList)`. – Zohaib Syed May 21 '19 at 02:40
  • Sorry for the late follow up but to provide some closure, I had ended up rewriting a large part of what I was working on and then Pedro's suggestion to wrap it in a list before using the sum ended up working smoothly. – Zohaib Syed Aug 31 '19 at 15:50