-3

I got a dataframe(named table) with 6 columns labeled as [price1,price2,price3,time,type,volume]

for type, I got 'Q' and 'T', arranged like:

Q

T

Q

T

T

Q

Now I want to combine the rows with consecutive T and add up the value of volume. The value of prices and time are the same for consecutive Ts

i.e. I want

Price...: Time: Type: Volume:

10000 2012.05 Q 10

10000 2012.05 T 20

10000 2012.05 Q 10

10000 2012.06 T 20

10000 2012.06 T 30

10000 2012.07 Q 10

to be:

10000 2012.05 Q 10

10000 2012.05 T 20

10000 2012.05 Q 10

10000 2012.06 T 20+30=50

10000 2012.07 Q 10

here is my code but does not return the desired result, so can someone please help me to figure out my mistake?

    def combine(df):
    combined = [] # Init empty list
    length = len(df.iloc[:,0]) # Get the number of rows in DataFrame
    i = 0
    while i < length:
        num_elements = num_elements_equal(df, i, 0, 'T') # Get the number of consecutive 'T's
        if num_elements <= 1: # If there are 1 or less T's, append only that element to combined, with the same type
            combined.append([df.iloc[i,0],df.iloc[i,1],df.iloc[i,2],df.iloc[i,3],df.iloc[i,4],df.iloc[i,5]])
        else: # Otherwise, append the sum of all the elements to combined, with 'T' type
            combined.append(['T', sum_elements(df, i, i+num_elements, 5)])
        i += max(num_elements, 1) # Increment i by the number of elements combined, with a min increment of 1
    return pd.DataFrame(combined, columns=df.columns) # Return as DataFrame

def num_elements_equal(df, start, column, value): # Counts the number of consecutive elements
    i = start
    num = 0
    while i < len(df.iloc[:,column]):
        if df.iloc[i,column] == value:
            num += 1
            i += 1
        else:
            return num
    return num

def sum_elements(df, start, end, column): # Sums the elements from start to end
    return sum(df.iloc[start:end, column])

tableT = combine(table)
tableT

raw data (Table) looks like this

bing
  • 195
  • 2
  • 11
  • Can you share raw data through Gist? Need to test. ANd even will make the question more clearer. – Jaffer Wilson Sep 06 '17 at 12:09
  • its similar to this question, but with more than two columns https://stackoverflow.com/questions/46059157/python-combine-rows-in-dataframe-and-add-up-values – bing Sep 06 '17 at 12:14

1 Answers1

2

IIUC:

Input dataframe, df:

   Price     Time Type  Volume
0  10000  2012.05    Q      10
1  10000  2012.05    T      20
2  10000  2012.05    Q      10
3  10000  2012.06    T      20
4  10000  2012.06    T      30
5  10000  2012.07    Q      10

Combine T records and sum volume:

df.groupby(by=[df.Type.ne('T').cumsum(),'Price','Time','Type'], as_index=False)['Volume'].sum()

Output:

   Price     Time Type  Volume
0  10000  2012.05    Q      10
1  10000  2012.05    T      20
2  10000  2012.05    Q      10
3  10000  2012.06    T      50
4  10000  2012.07    Q      10
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Thanks. However, this code returns me an empty dataframe – bing Sep 06 '17 at 13:30
  • What if you use the input dataframe given? – Scott Boston Sep 06 '17 at 13:31
  • it works perfectly well if i use the input dataframe given, however, with my dataframe, the Type 'T' disappeared after combining the data, and only 'Q' are left – bing Sep 06 '17 at 13:54
  • `df.Type.ne('T')` is the part that you need to modify to fit your original dataframe. Try to understand what is happening in on this test dataframe, then modify to fit your data. – Scott Boston Sep 06 '17 at 13:58
  • I've changed 'T' to 'Trade' already, but all the type 'Trade' disappeared after combining then, and only type 'Quote' are left – bing Sep 06 '17 at 14:14
  • Place those Nan on the Trade records with a value of 9999 or 0. – Scott Boston Sep 06 '17 at 14:16
  • Thank you very much for your help. Most of the type 'trade' are combined now. However, when the volume of the trade is small (i.e. <10), it does not add up. Do you have any solution to this problem? many thanks – bing Sep 06 '17 at 15:06
  • @bing accept this answer and post a new question with sample data showing where volumes are not showing up. Thank you. – Scott Boston Sep 06 '17 at 15:08
  • https://stackoverflow.com/questions/46099924/how-to-combine-consecutive-data-in-a-dataframe-and-add-up-value – bing Sep 07 '17 at 15:44