1

I Have a dataframe with 3 columns namely cuid ,type , errorreason. Now the error reason is empty and I have to fill it with the following logic- DATAFRAME BEFORE PROCCESSING 1.) If cuid is unique and type is 'COL' then errorreason is 'NO ERROR'( ALL UNIQUE VALUES ARE 'NO ERROR')

2.) If cuid is not unique , and type is 'COL' AND 'ROT' , then error is errorreason is 'AD'

3.) If cuid is not unique , and type is 'COL' AND 'TOT' , then error is errorreason is 'RE'

4.) Any other case , except the above mentioned , errorreason is 'Unidentified' Required dataframe after proccessing

I have already seperated the unique and non unique values , so first point is done. Kinda stuck on the next points . I was trying to group by the non unique values and then apply a function. Kinda stuck here.

Sanchit
  • 131
  • 3
  • 10
  • I think this link will provide you the solution to your problem: https://stackoverflow.com/questions/54944344/groupby-and-append-lists-and-strings – JL Picard Jul 15 '20 at 08:53
  • Hi , I saw the question , I think I know till this part that df=duplicated_df.groupby ('cuid').agg(d),. But how should I frame the function d here. – Sanchit Jul 15 '20 at 09:26
  • @Sanchit just checking out if my answer reaches your expected result.. – Ric S Jul 16 '20 at 07:34
  • Hi @RicS , You should try and run the code , there seems to be keyword error , I am trying to understand your piece , and see if I can relate : ) – Sanchit Jul 16 '20 at 08:12
  • @Sanchit I've re-run my code and for the sample data it works as intended. Let me know if you discover something! – Ric S Jul 16 '20 at 08:15

2 Answers2

0

This is a quite long solution, but I inserted explanations for each step so that they are clear to you. At the end you obtain your desired output

import numpy as np
import pandas as pd

# sample data
df = pd.DataFrame({
    'cuid': [100814, 100814, 100815, 100815, 100816],
    'type': ['col', 'rot', 'col', 'tot', 'col']
})

# define function for concatenating 'type' within the same 'cuid'
def str_cat(x):
    return x.str.cat(sep=', ')

# create a lookup dataset that we will merge later on
df_lookup = df.groupby('cuid').agg({
    'cuid': 'count',
    'type': str_cat
}).rename(columns={'cuid': 'unique'})

# create the variable 'error_reason' on this lookup dataset thanks to a case when like statement using np.select
df_lookup['error_reason'] = np.select(
    [
        (df_lookup['cuid'] == 1) & (df_lookup['type'] == 'col'),
        (df_lookup['cuid'] > 1) & (df_lookup['type'].str.contains('col')) & (df_lookup['type'].str.contains('rot')),
        (df_lookup['cuid'] > 1) & (df_lookup['type'].str.contains('col')) & (df_lookup['type'].str.contains('tot'))
    ],
    [
        'NO ERROR',
        'AD',
        'RE'
    ],
    default = 'Unidentified'
)

# merge the two datasets
df.merge(df_lookup.drop(columns=['type', 'unique']), on='cuid')

Output

     cuid type error_reason
0  100814  col           AD
1  100814  rot           AD
2  100815  col           RE
3  100815  tot           RE
4  100816  col     NO ERROR
Ric S
  • 9,073
  • 3
  • 25
  • 51
-2

Try to use this:

df.groupby('CUID',as_index=False)['TYPE'].aggregate(lambda x: list(x))

I have not tested this solution so let me know if it does not work.

Bando
  • 1,223
  • 1
  • 12
  • 31
JL Picard
  • 1
  • 2