0

I am working with huge data sets and I need to insert new rows where data is missing and interpolate it.

Data Values for each Group are in ascending order (we always have to start with 0.5 for each group) and the flag for missing data, as seen in the example, is when the value difference is larger than 0.5. The real problem starts when I need to combine it with the groupby function so that Group "A" last value doesn't interfere with Group "B" first value.

df = pd.DataFrame({
"Group": ["A", "A", "A", "A", "A", "B", "B", "B", "B"],
"Value": [0.5, 1, 1.5, 2.5, 3, 1, 1.5, 2, 2.5]
})

And this is my desired result:

df = pd.DataFrame({
"Group": ["A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B"],
"Value": [0.5, 1, 1.5, 2, 2.5, 3, 0.5, 1, 1.5, 2, 2.5]
})
MendelG
  • 14,885
  • 4
  • 25
  • 52
  • how would u like to turn the input data to the output what is the goal of the exercise – ombk Nov 26 '20 at 22:24
  • Sorry, that doesn't matter (already edited that). It should be the same df but with extra rows. – Miks Papirtis Nov 26 '20 at 22:32
  • Which part of the transformation process are you having trouble with? You should include any attempt you have made and point out which part of it works. Does [Missing data, insert rows in Pandas and fill with NAN](https://stackoverflow.com/questions/25909984/missing-data-insert-rows-in-pandas-and-fill-with-nan) answer your question? – wwii Nov 27 '20 at 15:42

1 Answers1

0

It might not be the most beautiful and elegant of solutions, but it has the merit of solving your problem:

iterate = [df['Group'].unique(),df['Value'].unique()]
df = df.set_index(['Group','Value'])
df = df.reindex(index=pd.MultiIndex.from_product(iterate , names=['Group', 'Value']), fill_value='NONE').reset_index()

which returns

   Group  Value
0      A    0.5
1      A    1.0
2      A    1.5
3      A    2.0
4      A    2.5
5      A    3.0
6      B    0.5
7      B    1.0
8      B    1.5
9      B    2.0
10     B    2.5
11     B    3.0