How to divide a dataframe by another column of the same dataframe

Question

I'm stuck on a problem with dataframes due to the lack of understanding in looping/iterating/matrices/etc.

so I have a dataframe or an array (whatever works):

initial = [[1,2,3,3], [4,5,6,6],[7,8,9,9]]

I need to divide all the values in the array/df excluding the last column by the values of the last column row by row, so that I obtain the result:

result = [[0.33, 0.66, 1], [0.66, 0.83, 1],[0.77, 0.88,1]]

So e.g. I would go with the first list like so: 1/3, 2/3, 3/3, then take the next list and divide 4/6, 5/6, 6/6, and so on...

I would want to either store the result in a separate df/array or overwrite the original df/array, whatever works best. Note (if it matters): the last column does not contain 0 (nulls) or NaNs and the values are equal or greater than the values in the preceding columns (based on each row).

I'd also like to know if I can determine the rows to go by based on a column that stands before column 0 (originally I dropped this column to just have the numbers and add it later, but it would be an absolute plus, if the rows would be calculated related to this column which contains unique strings (they're set index to my original dataframe)

problem and result

Could you explain the last paragraph of your question? Probably show some example? — Sayandip Dutta, Feb 04 '21 at 16:20

Trenton McKinney · Answer 1 · 2021-02-04T20:23:11.947

The sample data in the OP is a list of lists, not an array.
The following example will use a pandas dataframe, since that is what the question is tagged for.
Use .div() with axis=0
The columns can be selected with .iloc, .loc or [].
- Selection by label
  - if the rows would be calculated related to this column which contains unique strings: implies selection by label - .loc
- Selection by position
- Selection by callable
- SO: How to select rows from a DataFrame based on column values
Add .round(2) to the end of result to set the number of decimal places.
I'd also like to know if I can determine the rows to go by based on a column that stands before column 0: that is df.index
- df.iloc[:, :3].div(df.index, axis=0)

import pandas as pd

# sample dataframe
df = pd.DataFrame({0: [1, 4, 7], 1: [2, 5, 8], 2: [3, 6, 9], 3: [3, 6, 9]})

# display(df)
 0  1  2  3
 1  2  3  3
 4  5  6  6
 7  8  9  9

# divide
result = df.iloc[:, :3].div(df[3], axis=0)

# display(result)
       0        1   2
0.333333 0.666667 1.0
0.666667 0.833333 1.0
0.777778 0.888889 1.0

Explanation of the last part of my question: I guess I was just still thinking in terms of excel vlookup, where you look for the value, but want to perform something on offset values relativ to this :) — anama, Feb 06 '21 at 12:34

Philipp Schlehuber · Answer 2 · 2021-02-04T20:13:09.500

If initial is a numpy array this is easy to achieve using slicing:

import numpy
initial = numpy.array([[1,2,3,3],[4,5,6,6],[7,8,9,9]])
result = initial[:,:-1]/initial[:,[-1]]

In slicing ":" means take all, ":n" take all up to the n-th element (excluded) in the respective dimension. You can also (as for python lists) use the reverse indexing with negative numbers. numpy will then broadcast the dimensions correctly for you.

Check out the slicing syntax in detail as it is even more flexible than what is shown by this example.

I did however not quite get the second part of your question.

How to divide a dataframe by another column of the same dataframe

2 Answers2