2

I have a dataframe with three columns: Depth, Shale Volume and Density.

What I need to do is to calculate porosity based on the shale volume and density. So, where the shale volume is >0.7 I apply certain parameters for the porosity calculation and where i have the volume < 0.2 I have other parameters.

For example if the Shale volume is < 0.2:

 porosity=density*2.3

and if shale volume is >0.7:

 porosity=density*1.7

this is the example of the part of the dataframe if have:

 depth       density    VSH
 5517        2.126      0.8347083
 5517.5      2.123      0.8310949
 5518        2.124      0.8012414
 5518.5      2.121      0.7838615
 5519        2.116      0.7674243
 5519.5      2.127      0.8405414

this is the piece of code I am trying to do. I want it to be in for loop because it will serve for the future purposes:

 for index, row in data.iterrows():
     if data.loc[index, 'VSH']<0.2:
          data.loc[index,'porosity']=(data['density']*2.3)
     elif data.loc[index, 'VSH'] > 0.7:
          data.loc[index,'porosity']=(data['density']*1.7)

The error I am getting is the following, it would be great if you can provide me with help:

 TypeError: '<' not supported between instances of 'str' and 'float'
akkab
  • 401
  • 1
  • 6
  • 19

1 Answers1

2

Here iterrows is bad choice, because slow and exist vectorized solution, check Does pandas iterrows have performance issues?

So use numpy.select:

m1 = data['VSH'] < 0.2
m2 = data['VSH'] > 0.7
s1 = data['density']*2.3
s2 = data['density']*1.7

data['porosity'] = np.select([m1, m2], [s1, s2])

print (data)
    depth  density       VSH  porosity
0  5517.0    2.126  0.834708    3.6142
1  5517.5    2.123  0.831095    3.6091
2  5518.0    2.124  0.801241    3.6108
3  5518.5    2.121  0.783861    3.6057
4  5519.0    2.116  0.767424    3.5972
5  5519.5    2.127  0.840541    3.6159

Better is also defined, whats happen between 0.2 and 0.7 - e.g. returned value of column data['density'] in default parameter:

data['porosity'] = np.select([m1, m2], [s1, s2], default=data['density'])

print (data)
    depth  density       VSH  porosity
0  5517.0    2.126  0.834708    3.6142
1  5517.5    2.123  0.831095    3.6091
2  5518.0    2.124  0.801241    3.6108
3  5518.5    2.121  0.783861    3.6057
4  5519.0    2.116  0.767424    3.5972
5  5519.5    2.127  0.840541    3.6159
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for the reply but is there any way to use iteration within numpy approach to solve this issue – akkab May 09 '19 at 13:50
  • I have implemented the code that you have provided but the error still persists...TypeError: '<' not supported between instances of 'str' and 'float' – akkab May 09 '19 at 14:10
  • @KamranAbbasov - There is problem non numeric values, so try `data['VSH'] = data['VSH'].astype(float)` and if not working, because some strings use `data['VSH'] = pd.to_numeric(data['VSH'], errors='coerce')` – jezrael May 09 '19 at 14:12
  • Use `data['VSH'] = pd.to_numeric(data['VSH'], errors='coerce')` and `data['density'] = pd.to_numeric(data['density'], errors='coerce')`, if necessary also `data['depth'] = pd.to_numeric(data['depth'], errors='coerce')` – jezrael May 09 '19 at 14:21
  • yes, thats what i forgot to do) thats why ive deleted the message. thanks. seems to be working very well! thank you. any advise, using something instead of iterrows for the iterative approach? – akkab May 09 '19 at 14:22
  • If need loop by some custom function, is used `loop`. Check link in my answer. – jezrael May 09 '19 at 14:23