1

I am working on a dataframe,

0 1 2 3 4 5 6 7
new_width new_height new_depth audited_Width audited_Height audited_Depth inf val
---------- ----------- ---------- -------------- --------------- -------------- --- ---
35.00 2.00 21.00 21.00 2.50 35.00 T
12.00 4.40 10.60 11.60 4.40 12.00 T
20.50 17.00 5.50 21.50 17.05 20.50 F
24.33 22.00 18.11 24.00 22.05 24.33 T
23.00 23.00 19.00 19.00 23.00 23.00 F

Here i want to find difference between rows (0, 3) and (1,4) and (2,5) and verify if the difference value(any one or all the three) falls in the range(0,1), and if yes then it should check the corresponding cell in row 6 and if it is 'T', then print 'YES' in corresponding cell in row 7!

I have the following code:

a=df['new_width'] - df['audited_Width']
for i in a:
    if (i in range (0,1))==True:
        df['Value'] = 'Yes'        
print(df['Value'])

I know that 4th line is incorrect. What alternatives can I use to get the desired output?

BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
Surya_Jas
  • 25
  • 4

2 Answers2

1

You shouldn't iterate over the rows of a DataFrame. Instead, you can create a mask to select the rows that meet your condition, and then use that to fill the "val" column:

mask = ((df["new_width"] - df["audited_Width"]).between(0, 1) \
    | (df["new_height"] - df["audited_Height"]).between(0, 1) \
    | (df["new_depth"] - df["audited_Depth"]).between(0, 1)) \
    & (df["inf"] == "T")
df["val"] = df["val"].where(~mask, "YES")

This outputs:

   new_width  new_height  new_depth  audited_Width  audited_Height  audited_Depth inf  val
0      35.00         2.0      21.00           21.0            2.50          35.00   T  NaN
1      12.00         4.4      10.60           11.6            4.40          12.00   T  YES
2      20.50        17.0       5.50           21.5           17.05          20.50   F  NaN
3      24.33        22.0      18.11           24.0           22.05          24.33   T  YES
4      23.00        23.0      19.00           19.0           23.00          23.00   F  NaN
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
  • 1
    I would also recommend using boolean values (`True`, and `False`) instead of the strings `'T'` and `'F'` and the string `'Yes'` – Zach Flanders Jun 25 '22 at 17:51
  • thanks! this throws a warning 'SettingWithCopyWarning' but still works perfectly! also can you help me how to check equal condition. df = ((df["new_width"] == df["audited_width"]) \ | (df["new_height"] == df["audited_height"]) \ & (df["inference"] == "Consistent")``` – Surya_Jas Jun 25 '22 at 18:51
  • 1
    The edited version of my post should not throw a warning. Regarding your second point, the contents of your DF are floating point values, so you shouldn't use `==`. Instead, use something like `np.isclose`. See [this](https://stackoverflow.com/a/33627436/17769815) for more details. – BrokenBenchmark Jun 25 '22 at 18:57
1

Custom for loops are basically never the best option when it comes to pandas.

This is a method that reshapes your dataframe to an arguably better shape, performs a simple check on the new shape, and then extracts indices that should be modified in the original dataframe.

df.columns = df.columns.str.lower()
df2 = pd.wide_to_long(df.reset_index(), ['new', 'audited'], ['index'], 'values', '_', '\w+')
mask = df2[df2.new.sub(df2.audited).between(0,1) & df2.inf.eq('T')]
idx = mask.reset_index('index')['index'].unique()
df.loc[idx, 'val'] = 'YES'
print(df)

Output:

   new_width  new_height  new_depth  audited_width  audited_height  audited_depth inf  val
0      35.00         2.0      21.00           21.0            2.50          35.00   T  NaN
1      12.00         4.4      10.60           11.6            4.40          12.00   T  YES
2      20.50        17.0       5.50           21.5           17.05          20.50   F  NaN
3      24.33        22.0      18.11           24.0           22.05          24.33   T  YES
4      23.00        23.0      19.00           19.0           23.00          23.00   F  NaN
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • thanks! looks complicated but worked fine. But it throws an error 'TypeError: unsupported operand type(s) for -: 'float' and 'str' if a string value is present in the data – Surya_Jas Jun 25 '22 at 18:57
  • 1
    That is to be expected... strings can't be between 0 and 1. Also, you should check out what `df2` looks like after making it~ – BeRT2me Jun 25 '22 at 19:10