2

I have the following problem where my python code doesn't work. Hoping for some suggestions on why and how to resolve.

Here's the example dataframe:

cust_id max_nibt nibt_0 nibt_1  nibt_10 line_0  line_1  line_10
11  200 -5  200 500 100 200 300
22  300 -10 100 300 100 200 300
33  400 -20 0   400 100 200 300
for i in range (0,11):
    if (df4['nibt_%s' % i] == df4['max_nibt']): 
        df4['model_line'] = df4['line_%s' % i]

The code gives me the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

However, when I use .any(), it only gives me the last range assigning model_line = line_10. when i use .all(), the answer is the same for all the cust_ids. thoughts? Thanks in advance.

mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
Timmy
  • 21
  • 2

3 Answers3

4

I have to guess at what you want, but you clearly are not using pd.Series correctly... see here for a better explanation.

IIUC:
You want to fill in values from line_x when nibt_x equals max_nibt

# filter to get `nibt` columns and find the first column that equals max
nibt_maxes = df.filter(regex='nibt_\d+').eq(df.max_nibt, 0).idxmax(1)

# swap out the string `nibt` with `line`
lines = nibt_maxes.replace('nibt', 'line', regex=True)

# use `lookup` and assign values
df['model'] = df.lookup(lines.index, lines.values)

   cust_id  max_nibt  nibt_0  nibt_1  nibt_10  line_0  line_1  line_10  model
0       11       200      -5     200      500     100     200      300    200
1       22       300     -10     100      300     100     200      300    300
2       33       400     -20       0      400     100     200      300    300
Community
  • 1
  • 1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

Consider using .loc for row index conditionals. As is, your for loop compares all values of both columns (i.e., pandas Series) for equality and hence any number of boolean outcomes:

for i in [0,1,10]:
  df4.loc[df4['nibt_%s' % i] == df4['max_nibt'], 'model_line'] = df4['line_%s' % i]

Alternatively, since this for loop can overwrite the same new column, model_line, consider adding suffix versions of model_line:

for i in [0,1,10]:
  df4.loc[df4['nibt_%s' % i] == df4['max_nibt'], 'model_line_%s' % i] = df4['line_%s' % i]
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Great! Glad to help. When I started off in pandas, this error was probably the most common I encountered. Please do accept the most helpful answer (tick mark to side) to confirm resolution (even helps future readers). – Parfait Apr 14 '17 at 23:40
1

You can't compare Series like that because how will pandas know which elements you want to compare to each other?

If I understand correctly you can do:

for i in range(0,11):
  for j,k in df.iterrows():
    if k['nibt_%s' % i] == k['max_nibt']:
      df.iloc[j]['model_line'] = df.iloc[j]['line_%s' % i]
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223