Consider the dataframe, tests
, of individual tests in some boreholes:
borehole depthTest
0 B-1 1.5
1 B-1 20.0
2 B-42 1.0
3 B-42 2.0
4 B-42 15.0
5 B-42 30.0
6 B-09 1.0
7 B-09 10.0
8 B-09 15.0
I have another dataframe, liths
, with ranges of lithologies of each borehole:
borehole depthTop lith
0 B-1 0 sand
1 B-1 5 clay
2 B-1 18 shale
3 B-42 0 sand
4 B-42 1 clay
5 B-42 26 shale
6 B-09 0 sand
7 B-09 12 shale
The lithologies are a continuous sequence for each borehole. For example: in B-1 there is sand from a depth of 0 to 5 m, clay from 5 to 18 m, and shale from 18 m onwards. The bottom of each lithology is the top of the next. In other words, the bottom of each lithology would be liths.groupby('borehole').depthTop.shift(-1)
Edit: I want to join the two dfs so I can get the lithology of each test: I want to match on the borehole
and then find the lith
that has the closest depthTop
<= depthTest
.
For example: In B-42 there is clay from a depth of 1 m to 26 m. The test in B-42 at 15.0 m should be classified as clay because 15 is between 1 and 26.
Here's the desired result:
borehole depthTest lith
0 B-1 1.5 sand
1 B-1 20.0 shale
2 B-42 1.0 clay
3 B-42 2.0 clay
4 B-42 15.0 clay
5 B-42 30.0 shale
6 B-09 1.0 sand
7 B-09 10.0 sand
8 B-09 15.0 shale
This seems like a groupby
and merge_asof
problem, but I can't figure out how to get them together.
My solution so far, which works, is to dump this into sqlite3
and then do a between
join (like I did here), but that really seems like defeat.