4

I have a simple question. I have the following dataframe

df =
    time                                        lat          lon
    0   2014-03-26 14:46:27.457233+00:00    48.7773     11.428897
    1   2014-03-26 14:46:28.457570+00:00    48.7773     11.428719
    2   2014-03-26 14:46:29.457665+00:00    48.7772     11.428542
    3   2014-03-26 14:46:30.457519+00:00    48.7771     11.428368
    4   2014-03-26 14:46:31.457855+00:00    48.7770     11.428193
    5   2014-03-26 14:46:32.457950+00:00    48.7770     11.428018
    6   2014-03-26 14:46:33.457794+00:00    48.7769     11.427842
    7   2014-03-26 14:46:34.458131+00:00    48.7768     11.427668
    8   2014-03-26 14:46:35.458246+00:00    48.7767     11.427501
    9   2014-03-26 14:46:36.458069+00:00    48.7766     11.427350
    10  2014-03-26 14:46:37.458416+00:00    48.7766     11.427224
    11  2014-03-26 14:46:38.458531+00:00    48.7765     11.427129
    12  2014-03-26 14:46:39.458355+00:00    48.7764     11.427062
    13  2014-03-26 14:46:40.458702+00:00    48.7764     11.427011
    14  2014-03-26 14:46:41.458807+00:00    48.7764     11.426963
    15  2014-03-26 14:46:42.458640+00:00    48.7763     11.426918
    16  2014-03-26 14:46:43.458977+00:00    48.7763     11.426872
    17  2014-03-26 14:46:44.459102+00:00    48.7762     11.426822
    18  2014-03-26 14:46:45.458926+00:00    48.7762     11.426766
    19  2014-03-26 14:46:46.459262+00:00    48.7761     11.426702
    20  2014-03-26 14:46:47.459378+00:00    48.7760     11.426628

I would like to generate a new dataframe df1 that contains the values every 10 time steps.

df1 =
        time                                        lat          lon
        0       2014-03-26 14:46:27.457233+00:00    48.7773     11.428897
        9      2014-03-26 14:46:46.459262+00:00     48.7761     11.426702
        19      2014-03-26 14:46:46.459262+00:00    48.7765     11.426787
        ...        ...         ...                 ...       ....
        len(df) 2014-03-26 14:46:46.459262+00:00    48.7765     11.426787

I was try to do something like

df1 = df.iloc[[0:10:len(df)]]
emax
  • 6,965
  • 19
  • 74
  • 141
  • 2
    I don't understand your desired output. You have indices 0, 9, 19, which goes up first by 9 and then by 10. Why isn't it 0, 10, 20 (up by 10) or 0, 9, 18 (up by 9)? – DSM Sep 22 '15 at 20:08
  • Your slicing approach is the right idea and is nearly correct: use `df.iloc[::10]` to get every tenth row. (I strongly advise that you *don't* loop over the index.) – Alex Riley Sep 22 '15 at 20:12

2 Answers2

21

Just slice the df using iloc and pass a step param, the slicing behaviour can be explained here but basically the 3rd param is the step size:

In [67]:
df = pd.DataFrame(np.random.randn(100,2))
df.iloc[::10]

Out[67]:
           0         1
0   0.552160 -0.910893
10 -2.173707 -0.659227
20  0.811937  0.675416
30  0.533533  0.336104
40  1.093083 -0.943157
50 -0.559221  0.272763
60 -0.011628  1.002561
70 -0.114501  0.457626
80  1.355948  0.236342
90 -0.151979 -0.746238
Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562
1

How about df.loc[[i for j, i in enumerate(df.index) if j % 10 == 0]]?

hilberts_drinking_problem
  • 11,322
  • 3
  • 22
  • 51