0

Everyone.

I'm new to python and pandas, that I met a problem that I need to check whether a value of a certain columns changed over time(different rows). I totally have no idea that how to solve this problem.

I create a simple sample to illustrate it clearly:

    df = pd.DataFrame({"Year":[2011,2012,2013,2014,2011,2014,2014,2014,2011,2012,2012,2015],"Product":['A','A','A','A','B','B','B','B','C','C','C','C'],"Value1":[1,1,1,0,1,1,0,1,0,1,1,1]},index=['A','A','A','A','B','B','B','B','C','C','C','C'])
df

        Year    Product Value1
A       2011       A    1
A       2012       A    1
A       2013       A    1
A       2014       A    0
B       2011       B    1
B       2014       B    1
B       2014       B    0
B       2014       B    1
C       2011       C    0
C       2012       C    1
C       2012       C    1
C       2015       C    1

I want to check the value change through the years depend on different products, here is the logic to check:

  1. If the value1 is 0 for a product at first, just delete this row until value1 become 1. (e.g. Product C in 2011)

  2. If the value1 for a product is 1, then set the result as 0.(e.g. Product A in 2011 etc.)

  3. If the value1 change to 0 from 1 in a certain year then set the result as 1 and delete other rows of that product within same year (e.g. Product B in 2014)

So the final result should looks like this:

df2 = pd.DataFrame({"Year":[2011,2012,2013,2014,2011,2014,2012,2012,2015],"Product":['A','A','A','A','B','B','C','C','C'],"Value1":[1,1,1,0,1,0,1,1,1],'result':[0,0,0,1,0,1,0,0,0]},index=['A','A','A','A','B','B','C','C','C'])
df2

    Year    Product Value1  result
A   2011       A       1    0
A   2012       A       1    0
A   2013       A       1    0
A   2014       A       0    1
B   2011       B       1    0
B   2014       B       0    1
C   2012       C       1    0
C   2012       C       1    0
C   2015       C       1    0

I tried to use pd.apply to solve this problem, but I have no idea how to get another row value and compare to give out a new value.

Can someone help me?

10 Rep
  • 2,217
  • 7
  • 19
  • 33
Jianwei Wu
  • 21
  • 3

2 Answers2

1

Your logic is quite involved, so I have built it up in stages

  1. temporary columns with first value and count of product per year
  2. then your core logic with is using apply() and fact have first
  3. build a filter condition that is logic you provided
  4. finally filter to rows you want and cleanup delete temp column
df = pd.DataFrame({"Year":[2011,2012,2013,2014,2011,2014,2014,2014,2011,2012,2012,2015],"Product":['A','A','A','A','B','B','B','B','C','C','C','C'],"Value1":[1,1,1,0,1,1,0,1,0,1,1,1]},index=['A','A','A','A','B','B','B','B','C','C','C','C'])
df = df.assign(
        # need first value for logic
        first=df.groupby("Product")["Value1"].transform("first"),
        # need count for delete rows logic
        count=df.groupby(["Product","Year"])["Value1"].transform("count"),
        # core logic ...
        result=lambda dfa: dfa.apply(lambda r: 0 if ((r["first"]==0) or (r["first"]==r["Value1"])) else 1, axis=1),
        # delete / drop rows logic
        delete=lambda dfa: dfa.apply(lambda r: r["count"]>1 and r["result"]!=r["first"], axis=1)
).drop(["first","count"], axis=1) # drop temp columns used in logic

# filter and drop column used for filter
df = df[~df["delete"]].drop(["delete"], axis=1)

print(df.to_string())

output

   Year Product  Value1  result
A  2011       A       1       0
A  2012       A       1       0
A  2013       A       1       0
A  2014       A       0       1
B  2011       B       1       0
B  2014       B       0       1
C  2011       C       0       0
C  2012       C       1       0
C  2012       C       1       0
C  2015       C       1       0
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
0
df['result'] = abs(df['Value1'].diff()).bfill().astype(int)
adrian
  • 148
  • 8
  • The provided answer was flagged for review as a Low Quality Post. Here are some guidelines for [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer). This provided answer could benefit from an explanation. Code only answers are not considered "good" answers. – Trenton McKinney Aug 08 '20 at 06:15