-1

I have a Pandas dataframe with 3000+ rows that looks like this:

    t090:   c0S/m:    pr:      timeJ:  potemp090C:   sal00:  depSM:  \
407  19.3574  4.16649  1.836  189.617454      19.3571  30.3949   1.824
408  19.3519  4.47521  1.381  189.617512      19.3517  32.9250   1.372
409  19.3712  4.44736  0.710  189.617569      19.3711  32.6810   0.705
410  19.3602  4.26486  0.264  189.617627      19.3602  31.1949   0.262
411  19.3616  3.55025  0.084  189.617685      19.3616  25.4410   0.083
412  19.2559  0.13710  0.071  189.617743      19.2559   0.7783   0.071
413  19.2092  0.03000  0.068  189.617801      19.2092   0.1630   0.068
414  19.4396  0.00522  0.068  189.617859      19.4396   0.0321   0.068

What I want to do is: create individual dataframes from each portion of the dataframe in which the values in column 'c0S/m' exceed 0.1 (eg rows 407-412 in the example above).

So let's say that I have 7 sections in my 3000+ row dataframe in which a series of rows exceed 0.1 in the second column. My if/for/while statement will slice these sections and create 7 separate dataframes.

I tried researching the best I could but could not find a question that would address this problem. Any help is appreciated.

Thank you.

  • When you say "section" or "portion", are you referring to a set of consecutive rows, all of which have the inclusion criterion specified? Do you want to retain the indices from the original data frame? – andrew_reece Sep 03 '17 at 21:03
  • Ah. I think you mean separate into sections in which _consecutive_ rows satisfy the condition. Right? That could be tricky. – Bill Sep 03 '17 at 21:17
  • You could first take all rows that satisfy the condition, then use the row index to find the points where you want to make 'section-breaks'. I can't think of any way other than looping through the data frame and creating a set of index ranges. – Bill Sep 03 '17 at 21:20

2 Answers2

0

Here's another way.

sub_set = df[df['c0S/m'] > 0.1]

last = None

for i in sub_set.index:
    if last is None:
        start = i
    else:
         if i - last > 1:
            print start, last
            start = i
    last = i

I think it works. (Instead of print start, last you could insert code to create the slices you wanted of the original data frame).

Some neat tricks here that do an even better job.

Bill
  • 10,323
  • 10
  • 62
  • 85
0

You can try this:

First add a column of 0 or 1 based on whether the value is greater than 1 or less.

df['splitter'] = np.where(df['c0S/m:'] > 1, 1, 0)

Now groupby this column diff.cumsum()

df.groupby((df['splitter'].diff(1) != 0).astype('int').cumsum()).apply(lambda x: [x.index.min(),x.index.max()])

You get the required blocks of indices

splitter
1    [407, 411]
2    [412, 414]
3    [415, 415]

Now you can create dataframes using loc

df.loc[407:411]

Note: I added a line to your sample df using:

df.loc[415] = [19.01, 5.005, 0.09, 189.62, 19.01, 0.026, 0.09] 

to be able to test better and hence its splitting in 3 groups

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Vaishali
  • 37,545
  • 5
  • 58
  • 86