1

I have a dataframe with some uneven distributions of values - some missing and some not missing.

The data looks like this:

Utility    Location     ID       Name        Unit1      Mover1      Unit2    Mover2

500        Municipal    75       Glover      1A         GT                      
500        Municipal    75       Glover      2A         GT               
500                     75       Glover      3A         GT               
500        Municipal    75       Glover                             1A       GT           
500                     75       Glover                             2A       GT
500                     75       Glover                             3A       GT
.
.
51        Provincial    85       Toshi       1          CT          1B       CT           
51        Provincial    85       Toshi       2          CT          2B       CT
51        Provincial    85       Toshi       5          ST          5B1      ST

I'd like to fill and merge the dataframe when Unit1 = Unit2 and when Mover1 = Mover2 but keeping the already matched parts alone - Toshi values. The second need is to have the Location values fill with the similar values to the Name, ID, and Utility - if any of these match it should fill with the Location used previously.

The output would look like:

Utility    Location     ID       Name        Unit1      Mover1      Unit2    Mover2

500        Municipal    75       Glover      1A         GT          1A       GT                  
500        Municipal    75       Glover      2A         GT          2A       GT         
500        Municipal    75       Glover      3A         GT          3A       GT                       
.
.
51        Provincial    85       Toshi       1          CT          1B       CT           
51        Provincial    85       Toshi       2          CT          2B       CT              
51        Provincial    85       Toshi       5          ST          5B1      ST

Is there an easy way to condense this dataframe like above?

HelloToEarth
  • 2,027
  • 3
  • 22
  • 48

1 Answers1

0

I can only come up with this slow solution(Two applys must be slow ) , For high performance please check Python: Justifying NumPy array

df.groupby(['Utility','ID','Name']).apply(lambda x : x.apply(lambda y : sorted(y,key=pd.isna))).dropna()
Out[51]: 
   Utility    Location  ID    Name Unit1 Mover1 Unit2 Mover2
0      500   Municipal  75  Glover    1A     GT    1A     GT
1      500   Municipal  75  Glover    2A     GT    2A     GT
2      500   Municipal  75  Glover    3A     GT    3A     GT
6       51  Provincial  85   Toshi     1     CT    1B     CT
7       51  Provincial  85   Toshi     2     CT    2B     CT
8       51  Provincial  85   Toshi     5     ST   5B1      S
BENY
  • 317,841
  • 20
  • 164
  • 234