0

My dataframe:

A   B   C    A_Q  B_Q     C_Q
27  40  41    2    1      etc
28  39  40    1    5    
30  28  29    3    6    
28  27  28    4    1    
15  10  11    5    4    
17  13  14    1    5    
16  60  17    8    10   
14  21  18    9    1    
20  34  23    10   2    
21  45  34    7    4    

I want to iterate through each row in every column with a _Q suffix, starting with A_Q and do the following:

  1. if row value = '1', grab the corresponding value in col 'A'
  2. assign that value to a variable, call it x
  3. keep looping down the col A_Q
  4. if row value is either 1,2,3,4,5,6,7,8 or 9, ignore
  5. if the value is 10, then get the corresponding value in col 'A' and assign that to variable y
  6. calculate % change, call it chg, between y and x: (y/x)-1)*100
  7. append chg to dataframe
  8. keep going down the column with steps 1-7 above until the end

Then do the same for the other columns B_Q, C_Q etc

So for example, in the above, the first "1" that appears corresponds to 28 in col A. So x = 28. Then keep iterating, ignoring values 1 through 9, until you get a 10, which corresponds to 20 in col A. Calculate % change = ((20/27)-1)*100 = -25.9% and append that to df in a newly created col A_S. Then resume from that point on with same steps until reach end of the file. And finally, do the same for the rest of the columns.

So then the df would look like:

A   B   C    A_Q  B_Q     C_Q    A_S        B_S         C_S etc
27  40  41    2    1      etc
28  39  40    1    5    
30  28  29    3    6    
28  27  28    4    1    
15  10  11    5    4    
17  13  14    1    5    
16  60  17    8    10                        50
14  21  18    9    1    
20  34  23    10   2             -25.9
21  45  34    7    4    

I thought to create a function and then do something like df ['_S'] = df.apply ( function, axis =1) but am stuck on the implementation of the above steps 1-8. Thanks!

BenG
  • 304
  • 1
  • 11
Eksana Stasis
  • 173
  • 1
  • 1
  • 7

1 Answers1

1

Do you need to append the results as a new column? You're going to end up with nearly empty columns with just one data value. Could you just append all of the results at the bottom of the '_Q' columns? Anyway here's my stab at the function to do all you asked:

def func(col1, col2):
    l = []
    x = None
    for index in range(0, len(col1)):
        if x is None and col1[index] == 1:
            x = col2[index]
            l.append(0)
        elif not(x is None) and col1[index] == 10:
            y = col2[index]
            l.append(((float(y)/x)-1)*100)
            x = None
        else:
            l.append(0)
    return l

You'd then pass this function A_Q as col1 and A as col2 and it should return what you want. For passing functions, assuming that every A, B, C column has an associated _Q column, you could do something like:

q = [col for col in df.columns if '_Q' in col]
for col in q:
    df[col[:len(col) - 2] + '_S] = func(df[col], df[col[:len(col) - 2]
JSolomonCulp
  • 1,504
  • 4
  • 13
  • 16
  • Thx JSol, I don't mind appending results. Re passing the column names to the func, the issue is that my df will have dozens of these columns, and the only way to identify them is with the _Q suffix. How to do? For ex, A_Q is the first col in the df with a _Q suffix. How then can we grab what comes to the left of the _Q in the A_Q, i.e., the 'A' for the col2 value? – Eksana Stasis Jul 05 '18 at 21:28
  • Check out the edit I made. Could something like that work? Depends on the format of the columns but if it's exactly like the example it would be something like for col in df.columns[:3] – JSolomonCulp Jul 05 '18 at 21:52
  • I meant I'd actually prefer to append the results as a new column so the end result looks like the dataframe at the end of my post. That way it's easier to do a visual check with all the data. I follow what you're doing with nudging up to the first _Q column but that's not the ideal solution, would want to be able to identify the first column that ends in _Q and then from there identify the associated prior column, without having to look into the dataframe, if that makes sense? You are really close! – Eksana Stasis Jul 05 '18 at 22:05
  • Maybe something like that? It doesn't fix the appending results as a new column but that shouldn't be too hard to fix. – JSolomonCulp Jul 05 '18 at 22:31
  • I'm getting an invalid syntax error on this line: l.append(func(df[col], df[col[:len(col) - 2]) – Eksana Stasis Jul 05 '18 at 22:49
  • I tried this: l.append(func(df[col], df[col[:len(col)-2]])) and that seems to work. i'm checking the output now to see if its returning the right numbers – Eksana Stasis Jul 05 '18 at 22:54
  • ok I checked the numbers: we are almost there! the problem is that the function runs for the first set of (1,10) values and then stops, instead of continuing down the column to calculate the rest of the % change for the next sets of (1,10) values. So the issue seems to be with the func, I think – Eksana Stasis Jul 05 '18 at 22:59
  • Oh I didn't realize that it needed to happen multiple times per column. Try that? Func now returns a list of all the answers and you should be able to easily append that as a new column – JSolomonCulp Jul 05 '18 at 23:10
  • it returned just 1,1,1 when I print array l. Should we be setting the col1[index]=None at some point to restart the process? – Eksana Stasis Jul 05 '18 at 23:25
  • It's working for me? I used these two lists A_Q = [1, 2, 3, 10, 4, 5, 1, 4, 10] and A = [4, 5, 6, 8, 4, 4, 2, 5, 12] and returned a list with [100.0, 500.0] which is correct. – JSolomonCulp Jul 05 '18 at 23:46
  • Got it now, thanks. JSol, that was phenomenal! To be honest, I don't quite understand how the x = none relates to the if clauses, but I'm going to try and study it some more to fully benefit from the lesson here. Thanks a million again – Eksana Stasis Jul 06 '18 at 00:23
  • Of course, glad I could help. None in Python is kind of the equivalent to Null in other languages. In this case, if x has been set to something that means that you've already seen a 1 in _Q and are just looking for a 10. However, if x is None, that means you haven't seen a 1 yet and are looking for that. It's the same effect as setting it to something you know isn't in the data, such as -99999, it's just to keep track of what you're currently looking for. – JSolomonCulp Jul 06 '18 at 16:26
  • JSol, thanks re none, I understand now. I tried to modify things with q = [col for col in df.columns if '_Q' in col] for col in q: df["{}_S".format(col)] = df.apply(func(df[col], df[col[:len(col)-2]])) but am stuck. How can we modify this to append so the final result is not in a list but appends to the dataframe? – Eksana Stasis Jul 06 '18 at 18:42
  • I get this error: "ValueError: Length of values does not match length of index". Came across this in my google search: "The error comes up when you are trying to assign a list of numpy array of different length to a data frame..." – Eksana Stasis Jul 07 '18 at 03:15
  • Hi JSol, I tried editing the func by replacing l = [ ] with a dataframe like df2 = pd.Series(), then do df2.append (((float(y)/x)-1)*100) and return df2, but I get the error "cannot concatenate object of type ""; only pd.Series, pd.DataFrame and pd.panel (deprecated) objs are valid" – Eksana Stasis Jul 09 '18 at 18:19
  • This is the issue with appending the results to the dataframe. You don't have the same amount of data in the new columns as you do in the original dataframe. Check out this SO thread: https://stackoverflow.com/questions/27126511/add-columns-different-length-pandas – JSolomonCulp Jul 10 '18 at 18:02
  • right. been googling non stop and studying this out.. it seems like this should work if we 1) append to list as you did originally 2) transfer to a pd.series 3) transfer the pd.series to a df 4) have the func return that df. So for ex, I did s = pd.Series(L) then df2 = df2.join(pd.DataFrame(s).T,fillna(method='ffill)) but still running into errors. Seems like this should be doable? – Eksana Stasis Jul 10 '18 at 18:49
  • actually just realized that doesn't change the original issue, which is the mismatch on the amount of data. What if we amend the func to fill in 'na' to the list every time there isn't a calculation made? so would add something like elif l.append ('0')? would that make the data match evenly then, solving the first hurdle? – Eksana Stasis Jul 10 '18 at 18:56
  • Yes that should work. Check out the edit I made to the function – JSolomonCulp Jul 10 '18 at 19:12
  • That worked! JSol you are awesome! Thanks so much again :-) – Eksana Stasis Jul 10 '18 at 19:18
  • Great! Glad I could help – JSolomonCulp Jul 10 '18 at 20:30