0

After this discussion, I have the following dataframe:

data = {'Item':['1', '2', '3', '4', '5'], 
'Len':[142, 11, 50, 60, 12], 
'Hei':[55, 65, 130, 14, 69],
'C':[68, -18, 65, 16, 17],
'Thick':[60, 0, -150, 170, 130],
'Vol':[230, 200, -500, 10, 160]
'Fail':[['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], ""}

df = pd.DataFrame(data)

representing different items and the corresponding values related to some of their parameters (Le, Hei, C, ...). In the column Fail are reported the parameters that are failed, e. g. item 1 fails for parameters Len and Thick, item 3 fails for parameters B, Thick and Vol, while item 4 shows no failure. For each item I need a new column where it is reported the failed parameter together with its value, in the following format: failed parameter = value. So, for the first item I should get Len=142 and Thick=60. So far, I have exploded the Fail column into multiple columns:

failed_param = df['Fail'].apply(pd.Series)
failed_param = failed_param.rename(columns = lambda x : 'Failed_param_' + str(x +1 ))
df2_list = failed_param.columns.values.tolist()
df2 = pd.concat([df[:], failed_param[:]], axis=1)

Then, if I do the following:

for name in df2_list:
    df2.loc[df2[f"{name}"] == "D", "new"] = "D"+ "=" + df2["D"].map(str)

I can get what I need but for only one parameter (D in this case). How can I obtain the same for all the parameters all at once?

heartbit
  • 97
  • 1
  • 10

2 Answers2

2

As mentioned in the question, you need to insert a new column (e.g., FailParams) that contains a list of strings. Each string represents the items' failures (e.g., Len=142,Thick=60). A quick solution can be:

import pandas as pd

data = {
  'Item' : ['1', '2', '3', '4', '5'],
  'Len'  : [142, 11, 50, 60, 12],
  'Hei'  : [55, 65, 130, 14, 69],
  'C'    : [68, -18, 65, 16, 17],
  'Thick': [60, 0, -150, 170, 130],
  'Vol'  : [230, 200, -500, 10, 160],
  'Fail' : [['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], []]
}

# Convert the dictionary into a DataFrame.
df = pd.DataFrame(data)

# The first solution: using list comprehension.
column = [
  ",".join(  # Add commas between the list items.
    # Find the target items and their values.
    [el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]]
  )
  if (len(L[1]) > 0) else ""  # If the Fail inner is empty, return an empty string.
  for L in zip(df['Item'].values, df['Fail'].values)  # Loop on the Fail items.
]

# Insert the new column.
df['FailParams'] = column

# Print the DF after insertion.
print(df)

The previous solution is added using list comprehension. Another solution using loops can be:

# The second solution: using loops.
records = []
for L in zip(df['Item'].values, df['Fail'].values):
  if (len(L[1]) <= 0):
    record = ""
  else:
    record = ",".join([el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]])
  records.append(record)
print(records)

# Insert the new column.
df['FailParams'] = records

# Print the DF after insertion.
print(df)

A sample output should be:

  Item  Len  Hei   C  Thick  Vol               Fail                   FailParams
0    1  142   55  68     60  230       [Len, Thick]             Len=142,Thick=60
1    2   11   65 -18      0  200            [Thick]                      Thick=0
2    3   50  130  65   -150 -500  [Hei, Thick, Vol]  Hei=130,Thick=-150,Vol=-500
3    4   60   14  16    170   10              [Vol]                       Vol=10
4    5   12   69  17    130  160                 []
  • Your solution indeed seems to give the right answer! As highlighted also by @fsimonjetz, your solution needs that the empty cell in the Fail column is an empty list; this is not my case, but I replaced the NaN with an empty list (by the following line: df["Fail"] = df["Fail"].apply(lambda d: d if isinstance(d, list) else [])) and so it works! – heartbit May 16 '22 at 15:20
1

It might be a good idea to build an intermediate representation first, something like this (I am assuming the empty cell in the Fail column is an empty list [] so as to match the datatype of the other values):

# create a Boolean mask to filter failed values
m = df.apply(lambda row: row.index.isin(row.Fail), 
             axis=1, 
             result_type='broadcast')

>>> df[m]
  Item    Len    Hei   C  Thick    Vol Fail
0  NaN  142.0    NaN NaN   60.0    NaN  NaN
1  NaN    NaN    NaN NaN    0.0    NaN  NaN
2  NaN    NaN  130.0 NaN -150.0 -500.0  NaN
3  NaN    NaN    NaN NaN    NaN   10.0  NaN
4  NaN    NaN    NaN NaN    NaN    NaN  NaN

This allows you to actually do something with the failed values, too.

With that in place, generating the value list could be done by something similar to Hossam Magdy Balaha's answer, perhaps with a little function:

def join_params(row):
    row = row.dropna().to_dict()
    return ', '.join(f'{k}={v}' for k,v in row.items())

>>> df[m].apply(join_params, axis=1)
0                  Len=142.0, Thick=60.0
1                              Thick=0.0
2    Hei=130.0, Thick=-150.0, Vol=-500.0
3                               Vol=10.0
4                                       
dtype: object
fsimonjetz
  • 5,644
  • 3
  • 5
  • 21
  • 1
    Although the solution suggested by Hossam Magdy Balaha works well with the dataframe provided as an example, when dealing with a more complex dataframe gave me an error with index, instead this solution shows no problem, thus I accept this as the answer. Really thank you! – heartbit May 17 '22 at 09:31