2

Please help me to make my code more effective. This is my df:

df = pd.DataFrame([['A', 80], ['A', 64], ['A', 55], ['B', 56], ['B', 89], ['B', 73], ['C', 78], ['C', 100], ['C', 150], ['C', 76], ['C', 87]], columns=['Well', 'GR'])
Well    GR
A       80
A       64
A       55
B       56
B       89
B       73
C       78
C       100
C       150
C       76
C       87


Please help me to find the Vshale. Vshale on each well = GR - GR(min) / GR(max) - GR(min). This is my desired result:


Well    GR      Vshale
A       80      1
A       64      0.36
A       55      0
B       56      0
B       89      1
B       73      0.515151515
C       78      0.027027027
C       100     0.324324324
C       150     1
C       76      0
C       87      0.148648649


This code is work for me, but, I should create a new column that consists of GRMax and GRMin and merge it into my previous df. I am looking for a more effective way without adding GRmin and GRmax on my original df. Thank you.

df1 = df.groupby(['Well']).agg({'GR': ['min', 'max']}).reset_index()
df1.columns = list(map(''.join, df1.columns.values))
df2 = pd.merge (df, df1, on = 'Well', how = 'left')
df2['Vshale'] = (df2['GR'] - df2['GRmin'])/(df2['GRmax'] - df2['GRmin'])

elli
  • 35
  • 5
  • Can you make the dataframe reproducible, rather than just posting the values? https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – william3031 Dec 30 '20 at 04:23
  • 1
    Yes, pd.DataFrame([['A', 80], ['A', 64], ['A', 55], ['B', 56], ['B', 89], ['B', 73], ['C', 78], ['C', 100], ['C', 150], ['C', 76], ['C', 87]], columns=['Well', 'GR']) – elli Dec 30 '20 at 04:39

2 Answers2

1

one string solution with method transform:

df['Vshale'] = df.groupby('Well').transform(lambda x: (x - np.min(x))/(np.max(x) - np.min(x)))
  • What if I have more than 2 columns? – elli Dec 30 '20 at 09:03
  • @elli, two columns for what - for groupby or aggregation? Can you provide an example? – Alexander S Dec 30 '20 at 11:21
  • Let's say I have columns "Well", "GR", and "Depth". I only want to do this calculation on the "GR" column. Because I still don't understand the meaning of lambda x – elli Dec 30 '20 at 16:31
0

try this:

df=pd.DataFrame([['A', 80], ['A', 64], ['A', 55], ['B', 56], ['B', 89], ['B', 73], ['C', 78], ['C', 100], ['C', 150], ['C', 76], ['C', 87]], columns=['Well', 'GR'])

a=[]
for i in np.unique(df['Well']):
    x=pd.DataFrame(df['GR']-df['GR'][df['Well']==i].min())/(df['GR'][df['Well']==i].max()-df['GR'][df['Well']==i].min() )
    x['Well']=i
    x=x[df['Well']==i]
    a.append(x)
gr=pd.concat(a)
df['Vshale']=gr['GR']

output:

df
Out[53]: 
   Well   GR    Vshale
0     A   80  1.000000
1     A   64  0.360000
2     A   55  0.000000
3     B   56  0.000000
4     B   89  1.000000
5     B   73  0.515152
6     C   78  0.027027
7     C  100  0.324324
8     C  150  1.000000
9     C   76  0.000000
10    C   87  0.148649

Explanation:

  • Create a empty list a
  • Use for loop to achieve the logic for all the unique values of Well
  • Apply the formula and store the results in x ( Note that everytime you iterate over this loop, the formula is applied to all the values of Wells,Hence the next steps)
  • By indexing,reduce x to have only the values for that particular Well value.
  • append this x to the initialized list a
  • Now just concat the dataframes in the list and assign these values to the original dataframe
Suhas Mucherla
  • 1,383
  • 1
  • 5
  • 17
  • Thank you very much. Could you explain to me this code? a=[] for i in np.unique(df['Well']): x=pd.DataFrame(df['GR']-df['GR'][df['Well']==i].min())/(df['GR'][df['Well']==i].max()-df['GR'][df['Well']==i].min() ) – elli Dec 30 '20 at 05:49
  • @elli I've edited the answer to have the explanation. cheers :) – Suhas Mucherla Dec 30 '20 at 06:29