0

Here is the output of the dataframe

Date    Upper_zone      Lower_zone   Stock_name S/R
0   2018-02-12  163.40   155.75        ABFRL    Resistance becoming support
1   2017-03-16  200.00   189.10        CROMPTON Resistance becoming support
2   2017-04-11  127.69   126.16        CUB      Resistance becoming support
3   2017-02-02  644.40   625.00        ENDURANC Resistance becoming support
4   2019-08-27  15.70    15.20         GMRINFRA Resistance becoming support
5   2020-01-30  1287.00  1233.90       IPCALAB  Resistance becoming support
6   2017-08-01  17236.00 16220.50      PAGEIND  Resistance becoming support
7   2018-09-11  3788.00  3570.00       PFIZER   Resistance becoming support
8   2019-06-20  1261.35  1235.05       PIDILITIND Resistance becoming support
9   2018-09-26  17506.50 16803.40      SHREECEM Resistance becoming support
10  2018-09-03  556.67   542.13          VBL    Resistance becoming support
11  2018-10-31  563.33   533.37          VBL    Resistance becoming support
12  2019-02-06  562.90   534.00          VBL    Resistance becoming support
13  2017-07-05  479.00   461.70        VOLTAS   Resistance becoming support

Now I want to have only one stock with the latest date. Here VBL is appearing 3 times but I only want one line item of VBl with the latest date. ie.e 2010-02-06 and delete the remaining 2 line items.

here is the code I used group by

x  = final_df.groupby('Stock_name')
y = x['Date'].max()
print(y)

Output

Stock_name
ADANITRANS    2019-08-01
BEL           2019-02-14
BERGEPAINT    2020-01-06
ICICIGI       2019-10-07
INDIGO        2019-01-21
INFY          2019-10-24
MARICO        2017-02-15
RELIANCE      2019-08-07
TCS           2019-01-14
Name: Date, dtype: object

How can I add remaining columns with the output that I have received ?

Nilanka Manoj
  • 3,527
  • 4
  • 17
  • 48
Joseph arasta
  • 161
  • 1
  • 3
  • 12

1 Answers1

0

You can use drop_duplicate on column Stock_name and keep the last value like this

df.drop_duplicates(subset='Stock_name', keep="last")

or keep the max value on the Upper_zone column like this:

df.groupby('Stock_name', group_keys=False).apply(lambda x: x.loc[x.Upper_zone.idxmax()])

or you can sort the value in Upper_zone first, then use the drop_duplicate then.

Binh
  • 1,143
  • 6
  • 8