Find range that point falls into?

Question

Consider the dataframe, tests, of individual tests in some boreholes:

  borehole  depthTest
0      B-1        1.5
1      B-1       20.0
2     B-42        1.0
3     B-42        2.0
4     B-42       15.0
5     B-42       30.0
6     B-09        1.0
7     B-09       10.0
8     B-09       15.0

I have another dataframe, liths, with ranges of lithologies of each borehole:

  borehole  depthTop   lith
0      B-1         0   sand
1      B-1         5   clay
2      B-1        18  shale
3     B-42         0   sand
4     B-42         1   clay
5     B-42        26  shale
6     B-09         0   sand
7     B-09        12  shale

The lithologies are a continuous sequence for each borehole. For example: in B-1 there is sand from a depth of 0 to 5 m, clay from 5 to 18 m, and shale from 18 m onwards. The bottom of each lithology is the top of the next. In other words, the bottom of each lithology would be liths.groupby('borehole').depthTop.shift(-1)

Edit: I want to join the two dfs so I can get the lithology of each test: I want to match on the borehole and then find the lith that has the closest depthTop <= depthTest.

For example: In B-42 there is clay from a depth of 1 m to 26 m. The test in B-42 at 15.0 m should be classified as clay because 15 is between 1 and 26.

Here's the desired result:

  borehole  depthTest   lith
0      B-1        1.5   sand
1      B-1       20.0  shale
2     B-42        1.0   clay
3     B-42        2.0   clay
4     B-42       15.0   clay
5     B-42       30.0  shale
6     B-09        1.0   sand
7     B-09       10.0   sand
8     B-09       15.0  shale

This seems like a groupby and merge_asof problem, but I can't figure out how to get them together.

My solution so far, which works, is to dump this into sqlite3 and then do a between join (like I did here), but that really seems like defeat.

rafaelc · Answer 1 · 2018-05-09T23:26:17.657

1

Find the closest value on df2 and categorize:

def logic(k):
    vals = df2.loc[df2.borehole == k.borehole,:]
    return vals[vals.depthTop == max(vals[vals.depthTop <= k.depthTest].depthTop)].lith.item()

df.transform(logic, 1)


0     sand
1    shale
2     clay
3     clay
4     clay
5    shale
6     sand
7     sand
8    shale

edited May 09 '18 at 23:26

answered May 09 '18 at 20:57

rafaelc

57,686
15
58
82

Thanks for looking at that so closely. The expected output is correct. My description is poor. I want to match on the borehole and then find the `lith` that has `depthTop` <= `depthTest`. – cfort May 09 '18 at 22:23
In B-42 there is clay from a depth of 1 m to 26 m. The test in B-42 at 15.0 m should be classified as clay because 15 is between 1 and 26. – cfort May 09 '18 at 22:24
1

Then why is `7 B-09 10.0 shale` if there should be `sand` from 0 to 12? @ChuHo – rafaelc May 09 '18 at 22:54
You're right! I screwed that one up. Edit forthcoming. – cfort May 09 '18 at 23:19

Scott Boston · Accepted Answer · 2018-05-09T21:13:40.293

Let's try this:

tests['lith'] = tests.groupby('borehole')['depthTest'].transform(lambda x: pd.cut(x,
       bins = liths.loc[liths.borehole == x.name,'depthTop'].values.tolist() + [np.inf],
       labels=liths.loc[liths.borehole == x.name,'lith'], right=False))

Output:

  borehole  depthTest   lith
0      B-1        1.5   sand
1      B-1       20.0  shale
2     B-42        1.0   clay
3     B-42        2.0   clay
4     B-42       15.0   clay
5     B-42       30.0  shale
6     B-09        1.0   sand
7     B-09       10.0   sand
8     B-09       15.0  shale

Let's use pd.cut to label values in a range.
And by using groupby to get the appropriate bins and labels for pd.cut from liths dataframe by borehole.

Find range that point falls into?

2 Answers2