0

I need to loop through certain rows in my CSV file, for example, row 231 to row 252. Then I want to add up the values that I get from calculating every row and divide them by as many rows as I looped through. How would I do that?

I'm new to pandas so I would really appreciate some help on this.

I have a CSV file from Yahoo finance looking something like this (it has many more rows):

Date,Open,High,Low,Close,Adj Close,Volume
2019-06-06,31.500000,31.990000,30.809999,31.760000,31.760000,1257700
2019-06-07,27.440001,30.000000,25.120001,29.820000,29.820000,5235700
2019-06-10,32.160000,35.099998,31.780001,32.020000,32.020000,1961500
2019-06-11,31.379999,32.820000,28.910000,29.309999,29.309999,907900
2019-06-12,29.270000,29.950001,28.900000,29.559999,29.559999,536800

I have done the basic steps of importing pandas and all that. Then I added two variables corresponding to different columns to easily reference to just that column.

import pandas as pd
df = pd.read_csv(file_name)

high = df.High
low = df.Low

Then I tried doing something like this. I tried using .loc in a variable, but that didn't seem to work. This is maybe super dumb but I'm really new to pandas.

dates = df.loc[231:252, :]

for rows in dates:
        # calculations here
        # for example:
        print(high - low)
        # I would have a more complex calculation than this but 
        # but for simplicity's sake let's stick with this.

The output of this would be for every row 1-252 it prints high - low, for example:

...
231    3.319997
232    3.910000
233    1.050001
234    1.850001
235    0.870001
...

But I only want this output on a certain number of rows.

Then I want to add up all of those values and divide them by the number of rows I looped. This part is simple so you don't need to include this in your answer but it's okay if you do.

2 Answers2

0

.loc slices by label. For integer slicing use .iloc

dates = df.iloc[231:252]
wilkben
  • 657
  • 3
  • 12
  • Thanks a lot. But it's still looping every row from 1-252 and not starting from 231. Using PyCharm by the way, but that shouldn't affect anything as far as I'm aware. – Den Fula Ankungen Jul 10 '19 at 19:49
0

Use skiprows and nrows. Keep headers as per Python Pandas read_csv skip rows but keep header by passing a range to skiprows that starts with 1.

In [9]: pd.read_csv("t.csv",skiprows=range(1,3),nrows=2)
Out[9]:
         Date       Open       High        Low      Close  Adj Close   Volume
0  2019-06-10  32.160000  35.099998  31.780001  32.020000  32.020000  1961500
1  2019-06-11  31.379999  32.820000  28.910000  29.309999  29.309999   907900
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152