-2

How can I get below - Input - Input

and Output - Output

I tried to use shift but I am not getting the result

df1['Value_shift_down'] = df1['Value'].shift(-1)
df1['Value_shift_up'] = df1['Value'].shift(1)
  • 1
    Please post the input and output as code. – SomeDude Feb 20 '23 at 20:28
  • 1
    Also explain how the new columns df['Value_shift_down'] and df['Value_shift_up'] relate to your expected output. Refrain from showing your dataframe as an image. Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Feb 20 '23 at 20:32

2 Answers2

0

Determinging the lowest and higest value based on a string is not possible (because "10" is smaller then "2"). Therefore you need to replace your strings by integers.

So one solution might be to remove the unit names and convert the unit name numbers to integers. Then you can determine the min and max values by a pivot table.

df['Unit'] = df['Unit'].str.replace('Unit ', '', regex=False).astype(int)
df.pivot_table(index='Value', values='Unit', aggfunc=['min', 'max'])

If neccessary, you can convert afterwards the values back to string and add the unit name.

Jens
  • 231
  • 1
  • 6
  • I don't think that any kind of string conversion is necessary. Just look at the first and last values in the left column corresponding to each grouping of the right column. – L0tad Feb 20 '23 at 21:11
  • Also, this solution only works if the `'Value'`s are not repeated later on. – L0tad Feb 20 '23 at 21:26
  • Thank you for helping. I am sorry; my question was unclear, but the repeated value is getting grouped incorrectly. [link](https://stackoverflow.com/a/75514362/2739110) solved this issue. – Shubhendu Shubham Feb 20 '23 at 22:01
0

You were on the right track with pandas .shift(). Instead of converting back and forth between strings and integers, you can leverage shift to compare each value with the previous, then use a cumulative sum (.cumsum()) to detect when the value changes. The final element is the DataFrame .groupby() method, which allows us to collect groups of same-Value rows together and get the start and end Units for that group. The code to do this goes something like this:

result_groups = []
for _, group in df1.groupby((df1["Value"] != df1["Value"].shift()).cumsum()):
    start_unit = group.loc[group.index.min(), "Unit"]
    end_unit = group.loc[group.index.max(), "Unit"]
    value = group["Value"].unique()[0]
    result_groups.append({"Start_Unit": start_unit, "End_Unit": end_unit, "Value": value})
results = pd.DataFrame(result_groups)

The results DataFrame looks like this, as desired:

Start_Unit End_Unit Value
0 Unit 1 Unit 1 2
1 Unit 2 Unit 2 4
2 Unit 3 Unit 3 1
3 Unit 4 Unit 5 5
4 Unit 6 Unit 6 7
5 Unit 7 Unit 7 6
6 Unit 8 Unit 8 9
7 Unit 9 Unit 20 0

Note: This solution does not require uniqueness of the values in the 'Value' column on a group-by-group basis (i.e., if there was another sequence of 5s later, it would correctly recognize it as a new group). If you know ahead of time that this will never happen, you can take a simpler approach, using just df1.groupby("Value") instead and doing something similar inside the loop.

L0tad
  • 574
  • 3
  • 15