0

I have a large dataframe df that I am trying to do some calculation on, it look like this when I print the dataframe df before calculation:

            Date        High         Low  ...       Close      Volume   Adj Close
0     2000-01-03   47.995617   45.515598  ...   45.880310   6471200.0   34.997250
1     2000-01-04   45.479130   43.509705  ...   44.147945  10440800.0   33.675823
2     2000-01-05   44.676769   42.962643  ...   42.962643   8646200.0   32.820454
3     2000-01-06   44.457947   42.452049  ...   43.837940  10990900.0   33.489136
4     2000-01-07   44.786182   43.327351  ...   44.476181   6016400.0   33.976704
...          ...         ...         ...  ...         ...         ...         ...
5013  2019-12-05  118.430000  117.589996  ...  118.279999   3128300.0  118.279999
5014  2019-12-06  121.440002  119.910004  ...  120.610001   3287600.0  120.610001
5015  2019-12-09  121.529999  120.110001  ...  120.459999   2885200.0  120.459999
5016  2019-12-10  121.470001  120.029999  ...  120.900002   2518200.0  120.900002
5017  2019-12-11  121.379997  120.099998  ...  120.639999   1885981.0  120.639999

[5018 rows x 7 columns]

Now I want to do a RSI calculations on it and add the result to the data frame'

def rsi_calculator(df):
    last_value = float(df. loc[0, 'Adj Close'])
    print(last_value)
    for count, row in enumerate(df, start=1):
        print(float(df. loc[count, 'Adj Close']))

And it's not done at all, but how do I get my loop to run 5018 times which is the number of my rows, instead of 7 which is the number of my columns

34.997249603271484
33.67582321166992
32.82045364379883
33.4891357421875
33.97670364379883
34.450363159179695
34.770755767822266
34.589675903320305
jhjorsal
  • 197
  • 3
  • 10
  • `DataFrame.__iter__` is along the "info" axis, which is the columns axis for a DataFrame. (`DataFrame._info_axis`). You'd instead want `for idx, row in df.iterrows():` though there's typically a way to avoid looping. – ALollz Dec 12 '19 at 18:04
  • If you search in your browser for "PANDAS iterate rows", you'll find references that can explain this much better than we can manage here. Most likely, you will do this with a vectorized expression -- *your* code will not have a `for` loop, you'll simply feed it the row-vector operation and a generic "do this" statement. In general, you need to work a bit further through the PANDAS tutorials. – Prune Dec 12 '19 at 18:04
  • Hi Jhjorsal,can you elaborate a bit on what you want the new column of your dataframe to be? Like do you want each day's adjusted close minus the previous day's adjusted close? I'm not sure from your `rsi_calculator` what you actually want your new column to be – Max Power Dec 12 '19 at 18:09
  • 1
    You want to avoid loops in pandas programming (which is a different style of coding than general-purpose python). Please show actual RSI calculations with formulas. – Parfait Dec 12 '19 at 18:31

1 Answers1

1

Try

for col,row in df.iterrows():
    print((col,row["Adj Close"])[1])

This technique, for me at least, outputs a pair of items (a tuple). The [1] logic get the second item in the tuple which is the value you are looking for.

Good Luck!

Tyler H
  • 110
  • 9
  • 3
    While this answer is correct, OP and future readers should avoid looping in Pandas programming. If RSI means *Relative Strength Index*, there are vectorized (non-loop) ways of calculating in **one** call (not iterative calls). See: [Relative Strength Index in python pandas](https://stackoverflow.com/q/20526414/1422451). – Parfait Dec 12 '19 at 18:32
  • @Parfait Is right in terms of efficiency. There is basically always a faster way than looping over a dataframe. I have done it in several instances before though when performance was not going to matter since I would never have many rows of data, and I simply find looping more intuitive. When processing stock data, as in this question, performance often matters a lot since there are so many observations and getting a fast result is critical. – Tyler H Dec 12 '19 at 18:39