1

I have a Pandas dataframe which look like this.

CustId CustName  Price_1  Priceqty1  Price_2  Priceqty2  Price_3  Priceqty3  Price_4   Price_5
5015      Axn     315.12      1        374       1        126.32       3     167.8765     
5015      Axn     75.36       3      190.19      7        33.16        1     190.19      88
5015      Axn     123         5                  4        18.07        2      0.073      12
7315      Bxy     12.0        4      22.345      3        77.89        1     345.0       3344
7315      Bxy     987.90      7      34.06       4        90.09        3     876.34      908.76
3283      Cxz     123.34      8      55.78       7        12           9     878.09      98.456
3283      Cxz     178.90      7       88         8        0.09         0                 987.56

I have 5 different price columns with different values with different decimal places. For better understanding to end user I need to convert all the price columns values to a common decimal place.

To convert I need to follow certain Rule: 1.conversion should be done for every customer id separately. 2.Common decimal place will be determined by a price column value with largest decimal place.

Let's Take CustId 5015 example

CustId CustName  Price_1  Priceqty1  Price_2  Priceqty2  Price_3  Priceqty3  Price_4   Price_5
5015      Axn     315.12      1        374       1        126.32       3     167.8765     
5015      Axn     75.36       3      190.19      7        33.16        1     190.19      88
5015      Axn     123         5                  4        18.07        2      0.073      12

If you see you can find 1st row of Price_4 column Has value of 167.8765 here the decimal place is of 4 . If you check all the price column values for Custid 5015 the decimal place of 4 is the largest one when compared with other price columns values. since 4 is largest one I need to convert every price column values of custid 5015 to 4 decimal places.

After conversion This how it should look like.

CustId CustName  Price_1  Priceqty1  Price_2   Priceqty2  Price_3  Priceqty3  Price_4   Price_5
5015      Axn     315.1200      1    374.0000      1      126.3200     3     167.8765     
5015      Axn     75.3600       3    190.1900      7      33.1600      1     190.1900    88
5015      Axn     123.0000      5                  4      18.0700      2     0.0730      12

similar to custid 5015 the largest decimal value for custid 7315 would be 3

CustId CustName  Price_1  Priceqty1  Price_2  Priceqty2  Price_3  Priceqty3  Price_4    Price_5
7315      Bxy    12.000        4      22.345      3        77.890        1     345.000       3344.000
7315      Bxy    987.900      7      34.060       4        90.090        3     876.340      908.760

Blank values should be Blank only.

I need to do this for every custid there are more than 800 different customer id. What's the most efficient way to do this?

rohi
  • 175
  • 10

3 Answers3

1

Use custom function per groups:

def f(x):
    #get string with maximal values after '.'
    a = max([str(y).split('.')[1] for y in np.ravel(x) if pd.notna(y)], key=len)
    #set format of floats
    return x.applymap(lambda x:  f'{x:.{len(a)}f}').replace('nan','')

df1 = df.filter(like='Price_')
df[df1.columns] = df1.groupby(df['CustId']).apply(f)
print (df)
   CustId CustName   Price_1  Priceqty1   Price_2  Priceqty2   Price_3  \
0    5015      Axn  315.1200          1  374.0000       1.00  126.3200   
1    5015      Axn   75.3600          3  190.1900       7.00   33.1600   
2    5015      Axn  123.0000          5    4.0000      18.07    2.0000   
3    7315      Bxy    12.000          4    22.345       3.00    77.890   
4    7315      Bxy   987.900          7    34.060       4.00    90.090   
5    3283      Cxz   123.340          8    55.780       7.00    12.000   
6    3283      Cxz   178.900          7    88.000       8.00     0.090   

   Priceqty3   Price_4   Price_5  
0      3.000  167.8765            
1      1.000  190.1900   88.0000  
2      0.073   12.0000            
3      1.000   345.000  3344.000  
4      3.000   876.340   908.760  
5      9.000   878.090    98.456  
6      0.000   987.560            
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I am getting IndexError: list index out of range. – rohi Nov 15 '21 at 05:19
  • @rohi - What part of code raise error ? – jezrael Nov 15 '21 at 05:47
  • df[df1.columns] = df1.groupby(df['CustId']).apply(f) this one – rohi Nov 15 '21 at 05:52
  • @rohi - So `print(df.filter(like='Price_').columns)` return `Price` columns? Or no columns? – jezrael Nov 15 '21 at 05:53
  • no columns this is what i got for the above print statement Index([], dtype='object') – rohi Nov 15 '21 at 06:03
  • @rohi - So how is necessary change `df.filter(like='Price_')` ? This select all columns with `Price_` substrings. If failed, there is no columns names with substrings `Price_` – jezrael Nov 15 '21 at 06:04
  • sorry there was a typo in my dataframe name it should be df_dec. **print(df_dec.filter(like='Price_').columns)** this one gave me price columns. this is the result i got **Index(['Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5'], dtype='object')**. IndexError: list index out of range. is still there. – rohi Nov 15 '21 at 06:11
  • @rohi - Is possible in some column are not floats? What is `print (df_dec.filter(like='Price_').dtypes)` ? – jezrael Nov 15 '21 at 06:15
  • yep expect column price_5 everyother price column is of object type. I have changed ```df1 = df.filter(like='Price_') ``` to ```df1 = df_dec.filter(like='Price_').columns``` . this one gave me ```IndexError: index 35781 is out of bounds for axis 0 with size 5```. – rohi Nov 15 '21 at 06:19
  • @rohi - One idea - can you change `a = max([str(y).split('.')[1] for y in np.ravel(x) if pd.notna(y)], key=len)` to `a = max([str(y).split('.')[1] for y in np.ravel(x) if pd.notna(y) and '.' in str(y)], key=len)` – jezrael Nov 15 '21 at 06:21
  • I got this error ```IndexError: index 35781 is out of bounds for axis 0 with size 5```. this is value I have on Index: 35781 cust id: 132902 price_1: 12.05 price_2: price_3: 12.05 price_4: price_5: – rohi Nov 15 '21 at 06:30
  • @rohi - What return `print ( np.ravel(x))` for this problematic group? Get `print( np.ravel(x))` before `a = max([str(y).split('.')[1] for y in np.ravel(x) if pd.notna(y)], key=len)` – jezrael Nov 15 '21 at 06:34
  • This is what i tried ```def f(x): #get string with maximal values after '.' print(np.ravel(x)) a = max([str(y).split('.')[1] for y in np.ravel(x) if pd.notna(y)], key=len) #set format of floats return x.applymap(lambda x: f'{x:.{len(a)}f}').replace('nan','') df1 = df_dec.filter(like='Price_').columns df_dec[df1.columns] = df1.groupby(df_dec['Customer ID']).apply(f)``` it didnt print anything. – rohi Nov 15 '21 at 06:41
  • @rohi - it means no empty list? Or empty array ? – jezrael Nov 15 '21 at 06:51
  • no empty array. – rohi Nov 15 '21 at 06:53
  • @rohi - If change `print(np.ravel(x))` to `print(x)` what return for failed group? – jezrael Nov 15 '21 at 06:54
  • May i know how can i check this for particular group. x should be equal to index or custid. i am really new to numpy. – rohi Nov 15 '21 at 06:58
  • @rohi - there is somethinf wrong in this group, no idea what, so ask for testing. – jezrael Nov 15 '21 at 07:03
  • sure thankyou jezrael for replying. – rohi Nov 15 '21 at 07:04
0

You could start by computing the integer and decimal part length of the numbers and take the max:

lengths = (df.filter(like='Price_')
             .stack().astype(str).dropna()
             .str.split('.', expand=True).astype(str)
             .apply(lambda c: c.str.len())
             .max()
            )

output:

0    4  # this is the integer part
1    4  # this is the decimal part

Then, if you do not want to change the data but only print the dataframe:

integer, decimal = lengths.values
total = integer+decimal+1
custom_format = '{:%s.%sf}' % (total,decimal)
pd.options.display.float_format = custom_format.format
print(df.fillna(''))

output:

   CustId CustName   Price_1  Priceqty1   Price_2  Priceqty2   Price_3  Priceqty3   Price_4   Price_5
0    5015      Axn  315.1200          1  374.0000     1.0000  126.3200     3.0000  167.8765          
1    5015      Axn   75.3600          3  190.1900     7.0000   33.1600     1.0000  190.1900   88.0000
2    5015      Axn  123.0000          5    4.0000    18.0700    2.0000     0.0730   12.0000          
3    7315      Bxy   12.0000          4   22.3450     3.0000   77.8900     1.0000  345.0000 3344.0000
4    7315      Bxy  987.9000          7   34.0600     4.0000   90.0900     3.0000  876.3400  908.7600
5    3283      Cxz  123.3400          8   55.7800     7.0000   12.0000     9.0000  878.0900   98.4560
6    3283      Cxz  178.9000          7   88.0000     8.0000    0.0900     0.0000  987.5600          
mozway
  • 194,879
  • 13
  • 39
  • 75
0
df = df.fillna('')
m = df.filter(regex='Price_').astype(str).applymap(lambda x: len(x.split('.')[1]) if x else 0).max().max()
pd.options.display.float_format = ('{:,.' + str(m) + 'f}').format # based on https://stackoverflow.com/a/20937592/15035314
print(df)

Prints:

   CustId CustName  Price_1  Priceqty1  ...  Price_3  Priceqty3  Price_4    Price_5
0    5015      Axn 315.1200          1  ... 126.3200          3 167.8764           
1    5015      Axn  75.3600          3  ...  33.1600          1 190.1900    88.0000
2    5015      Axn 123.0000          5  ...  18.0700          2   0.0730    12.0000
3    7315      Bxy  12.0000          4  ...  77.8900          1 345.0000 3,344.0000
4    7315      Bxy 987.9000          7  ...  90.0900          3 876.3400   908.7600
5    3283      Cxz 123.3400          8  ...  12.0000          9 878.0900    98.4560
6    3283      Cxz 178.9000          7  ...   0.0900          0            987.5600

[7 rows x 10 columns]
Алексей Р
  • 7,507
  • 2
  • 7
  • 18