I have a DataFrame
from pandas:
import pandas as pd
inp = [{'Name': 'John', 'Year':2018, 'Address':'Beverly hills'}, {'Name': 'John', 'Year':2018, 'Address':'Beverly hills'}, {'Name': 'John', 'Year':2019, 'Address':'Beverly hills'}, {'Name': 'John', 'Year':2019, 'Address':'Orange county'}, {'Name': 'John', 'Year':2019, 'Address':'New York'}, {'Name': 'Steve', 'Year':2018, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2019, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2019, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2020, 'Address':'California'}, {'Name': 'Steve', 'Year':2020, 'Address':'Canada'}]
df = pd.DataFrame(inp)
print (df)
If a change in a row's string value occurs comparing to previous row, I want to identify it in a separate row "Cng-Address", and if row's numeric value changes identify it in "Cng-Year" column. If there is no change identify it as zero.
The index is “Name” meaning that the above calculations should be done for all rows associated to person name. If a “Name” changes (i.e. John to Steve) then calculations for "Cng-Address" and "Cng-Year" should reset. Column year sorted ascending.
As a final report I want to get:
- John changed years “1” time and changed locations “2” times
- Steve changed years “2” times and change locations “2” times
- Total changed addresses for Year 2019 is “2” times
Current Output:
+-------+------+---------------+
| Name | Year | Address |
+-------+------+---------------+
| John | 2018 | Beverly hills |
+-------+------+---------------+
| John | 2018 | Beverly hills |
+-------+------+---------------+
| John | 2019 | Beverly hills |
+-------+------+---------------+
| John | 2019 | Orange county |
+-------+------+---------------+
| John | 2019 | New York |
+-------+------+---------------+
| Steve | 2018 | Canada |
+-------+------+---------------+
| Steve | 2019 | Canada |
+-------+------+---------------+
| Steve | 2019 | Canada |
+-------+------+---------------+
| Steve | 2020 | California |
+-------+------+---------------+
| Steve | 2020 | Canada |
+-------+------+---------------+
Ideal Output:
+-------+------+---------------+----------+-------------+
| Name | Year | Address | Cng-Year | Cng-Address |
+-------+------+---------------+----------+-------------+
| John | 2018 | Beverly hills | 0 | 0 |
+-------+------+---------------+----------+-------------+
| John | 2018 | Beverly hills | 0 | 0 |
+-------+------+---------------+----------+-------------+
| John | 2019 | Beverly hills | 1 | 0 |
+-------+------+---------------+----------+-------------+
| John | 2019 | Orange county | 0 | 1 |
+-------+------+---------------+----------+-------------+
| John | 2019 | New York | 0 | 1 |
+-------+------+---------------+----------+-------------+
| Steve | 2018 | Canada | 0 | 0 |
+-------+------+---------------+----------+-------------+
| Steve | 2019 | Canada | 1 | 0 |
+-------+------+---------------+----------+-------------+
| Steve | 2019 | Canada | 0 | 0 |
+-------+------+---------------+----------+-------------+
| Steve | 2020 | California | 1 | 1 |
+-------+------+---------------+----------+-------------+
| Steve | 2020 | Canada | 0 | 1 |
+-------+------+---------------+----------+-------------+