0

I'd like to do something like this:

df = pd.DataFrame()
for row_ind1 in range(3):
    for row_ind2 in range(3:6):
        for col in range(6:9):
            entry = row_ind1 * row_ind2 * col
            df.loc[[row_ind1, row_ind2], col] = entry           

and get out:

     6 7 8
0 3  x x x
  4  x x x
  5  x x x
1 3  x x x
  4  x x x
  5  x x x
2 3  x x x
  4  x x x
  5  x x x

(As a bonus, winner gets to fill in the answers.)

young_souvlaki
  • 1,886
  • 4
  • 24
  • 28

1 Answers1

1

A MultiIndex with 2 levels can be pre-initialised to allow setting with loc to work as expected:

# Pre-initialise a MultiIndex
df = pd.DataFrame(index=pd.MultiIndex.from_arrays([[], []]))
for row_ind1 in range(3):
    for row_ind2 in range(3, 6):
        for col in range(6, 9):
            entry = row_ind1 * row_ind2 * col
            df.loc[(row_ind1, row_ind2), col] = entry

df:

        6     7     8
0 3   0.0   0.0   0.0
  4   0.0   0.0   0.0
  5   0.0   0.0   0.0
1 3  18.0  21.0  24.0
  4  24.0  28.0  32.0
  5  30.0  35.0  40.0
2 3  36.0  42.0  48.0
  4  48.0  56.0  64.0
  5  60.0  70.0  80.0

Although it's probably easier to just use broadcasted multiplication with numpy on the MultiIndex and columns to build the DataFrame and create the index and columns independently with MultiIndex.from_product:

import numpy as np
import pandas as pd

idx = pd.MultiIndex.from_product([[0, 1, 2], [3, 4, 5]]).to_frame()
cols = np.array([6, 7, 8])

df = pd.DataFrame((idx[0] * idx[1]).to_numpy()[:, None] * cols,
                  index=idx.index,
                  columns=cols)

df:

      6   7   8
0 3   0   0   0
  4   0   0   0
  5   0   0   0
1 3  18  21  24
  4  24  28  32
  5  30  35  40
2 3  36  42  48
  4  48  56  64
  5  60  70  80
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • Thank you very much for the clear and simple solution. I simplified my question for purposes of example; the first solution is what I was looking for, but both remain helpful. – young_souvlaki Oct 24 '21 at 01:43
  • Upon further review, I noticed your use of a tuple (`()`) in `loc` instead of an array (`[]`). Is this personal preference or documented as best practice? – young_souvlaki Oct 25 '21 at 16:56
  • tuple is necessary for multi-index access. A list is for a multiple index selection. In a multi-index each index is a tuple. – Henry Ecker Oct 25 '21 at 17:43
  • Got it! Would this be a place to use `at` instead of `loc` since we are dealing with scalar values? – young_souvlaki Oct 25 '21 at 18:49
  • There is no difference in effect between `at` and `loc` here. Although you are likely correct that `at` would be more "clear" as we're placing a single value "at" a specific cell. – Henry Ecker Oct 25 '21 at 18:53
  • Ah, okay so the difference would be in retrieval, not setting. Thank you once again. – young_souvlaki Oct 25 '21 at 19:49