find difference between multiple columns in a dataframe

Question

I am working on a dataframe,

0	1	2	3	4	5	6	7
new_width	new_height	new_depth	audited_Width	audited_Height	audited_Depth	inf	val
----------	-----------	----------	--------------	---------------	--------------	---	---
35.00	2.00	21.00	21.00	2.50	35.00	T
12.00	4.40	10.60	11.60	4.40	12.00	T
20.50	17.00	5.50	21.50	17.05	20.50	F
24.33	22.00	18.11	24.00	22.05	24.33	T
23.00	23.00	19.00	19.00	23.00	23.00	F

Here i want to find difference between rows (0, 3) and (1,4) and (2,5) and verify if the difference value(any one or all the three) falls in the range(0,1), and if yes then it should check the corresponding cell in row 6 and if it is 'T', then print 'YES' in corresponding cell in row 7!

I have the following code:

a=df['new_width'] - df['audited_Width']
for i in a:
    if (i in range (0,1))==True:
        df['Value'] = 'Yes'        
print(df['Value'])

I know that 4th line is incorrect. What alternatives can I use to get the desired output?

BrokenBenchmark · Accepted Answer · 2022-06-25T18:12:09.920

1

You shouldn't iterate over the rows of a DataFrame. Instead, you can create a mask to select the rows that meet your condition, and then use that to fill the "val" column:

mask = ((df["new_width"] - df["audited_Width"]).between(0, 1) \
    | (df["new_height"] - df["audited_Height"]).between(0, 1) \
    | (df["new_depth"] - df["audited_Depth"]).between(0, 1)) \
    & (df["inf"] == "T")
df["val"] = df["val"].where(~mask, "YES")

This outputs:

   new_width  new_height  new_depth  audited_Width  audited_Height  audited_Depth inf  val
0      35.00         2.0      21.00           21.0            2.50          35.00   T  NaN
1      12.00         4.4      10.60           11.6            4.40          12.00   T  YES
2      20.50        17.0       5.50           21.5           17.05          20.50   F  NaN
3      24.33        22.0      18.11           24.0           22.05          24.33   T  YES
4      23.00        23.0      19.00           19.0           23.00          23.00   F  NaN

edited Jun 25 '22 at 18:12

answered Jun 25 '22 at 17:32

BrokenBenchmark

18,126
7
21
33

1

I would also recommend using boolean values (`True`, and `False`) instead of the strings `'T'` and `'F'` and the string `'Yes'` – Zach Flanders Jun 25 '22 at 17:51
thanks! this throws a warning 'SettingWithCopyWarning' but still works perfectly! also can you help me how to check equal condition. df = ((df["new_width"] == df["audited_width"]) \ | (df["new_height"] == df["audited_height"]) \ & (df["inference"] == "Consistent")``` – Surya_Jas Jun 25 '22 at 18:51
1

The edited version of my post should not throw a warning. Regarding your second point, the contents of your DF are floating point values, so you shouldn't use `==`. Instead, use something like `np.isclose`. See [this](https://stackoverflow.com/a/33627436/17769815) for more details. – BrokenBenchmark Jun 25 '22 at 18:57

score 1 · Answer 2 · answered Jun 25 '22 at 18:06

Custom for loops are basically never the best option when it comes to pandas.

This is a method that reshapes your dataframe to an arguably better shape, performs a simple check on the new shape, and then extracts indices that should be modified in the original dataframe.

df.columns = df.columns.str.lower()
df2 = pd.wide_to_long(df.reset_index(), ['new', 'audited'], ['index'], 'values', '_', '\w+')
mask = df2[df2.new.sub(df2.audited).between(0,1) & df2.inf.eq('T')]
idx = mask.reset_index('index')['index'].unique()
df.loc[idx, 'val'] = 'YES'
print(df)

Output:

   new_width  new_height  new_depth  audited_width  audited_height  audited_depth inf  val
0      35.00         2.0      21.00           21.0            2.50          35.00   T  NaN
1      12.00         4.4      10.60           11.6            4.40          12.00   T  YES
2      20.50        17.0       5.50           21.5           17.05          20.50   F  NaN
3      24.33        22.0      18.11           24.0           22.05          24.33   T  YES
4      23.00        23.0      19.00           19.0           23.00          23.00   F  NaN

thanks! looks complicated but worked fine. But it throws an error 'TypeError: unsupported operand type(s) for -: 'float' and 'str' if a string value is present in the data — Surya_Jas, Jun 25 '22 at 18:57
That is to be expected... strings can't be between 0 and 1. Also, you should check out what `df2` looks like after making it~ — BeRT2me, Jun 25 '22 at 19:10

find difference between multiple columns in a dataframe

2 Answers2