Looping structure and Pandas

Question

I'm learning python and with it pandas and some tools about Data Science. Doing the exercises of a book I wrote the above code on IPython but I receive an error message when the block is executed:

for i in range(len(df1)):
    if (df1['Temperature'][i]-df1['Temperature'][i-1]) > 0.1:
        print (df1['Temperature'][i])

Traceback (most recent call last):

File "<ipython-input-140-9f31dd23b324>", line 2, in <module>
    if (df1['Temperature'][i]-df1['Temperature'][i-1]) > 0.1:

  File "D:\Programas\Anaconda\lib\site-packages\pandas\core\series.py", line 766, in __getitem__
    result = self.index.get_value(self, key)

  File "D:\Programas\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 3103, in get_value
    tz=getattr(series.dtype, 'tz', None))

  File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value

  File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value

  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item

KeyError: -1

Where df1['Temperature'] is a Data Frame such that Temperature is one of its columns. The code intending to compare two consecutive values of that column and verify the numeric difference between them and print the temperature given a statement. What am I doing wrong?

Did you read your error message? The last line is pretty unequivocal. — PMende, Sep 16 '18 at 19:51
I wasn't relating it to my code but to an internal pandas' message,my bad — Mattheus Sant'Anna, Sep 16 '18 at 19:57
You should avoid chained indexing. `df1['Temperature'][i]` should be replaced with `df1.loc[i, 'Temperature']`. But it's irrelevant since DYZ's answer shows you how to properly use pandas to solve this problem without a loop. — ALollz, Sep 16 '18 at 19:59

niraj · Accepted Answer · 2018-09-16T19:58:03.850

1

In statement below:

if (df1['Temperature'][i]-df1['Temperature'][i-1]) > 0.1:

when i is 0 then, in df1['Temperature'][i-1] the value of i-1 becomes -1 index which is the error message trying to tell. One way may be to change the range such that i starts from 1 since, it looks for i-1 anyways so, it may not skip 0 index. You can try:

for i in range(1, len(df1)):

Note: you mentioned comparing the consecutive rows, may be you can use absolute value if you do not care about whether it is increasing or decreasing.

edited Sep 16 '18 at 19:58

answered Sep 16 '18 at 19:52

niraj

17,498
4
33
48

just a note, if you are just learning may be it is fine but the answer by @DYZ would be more preferable way to use pandas which will give you advantage when you have large data files. For more details I would suggest looking into discussions in https://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas – niraj Sep 16 '18 at 20:06

DYZ · Answer 2 · 2018-09-16T20:04:00.047

1

As a rule, you should not use loops like that in Pandas. Pandas works best when your code is vectorized:

big_difference = (df1["Temperature"] - df1["Temperature"].shift(-1)) > 0.1
print(df1[big_difference]["Temperature"])

edited Sep 16 '18 at 20:04

answered Sep 16 '18 at 19:57

DYZ

55,249
10
64
93

Looping structure and Pandas

2 Answers2