1

I am trying to simply divide two columns element-wise, but for some reason this returns two columns instead of one as I would expect.

I think it has something to do with the fact that I need to create the dataframe iteratively, so I opted for by appending rows one at a time. Here's some testing code:

import pandas as pd


df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])

# Create a DataFrame
data = {
    'dataset': ['177.png', '276.png', '208.png', '282.png'],
    'partition': ['green', 'green', 'green', 'green'],
    'zeros': [1896715, 1914720, 1913894, 1910815],
    'ones': [23285, 5280, 6106, 9185],
    'total': [1920000, 1920000, 1920000, 1920000]
}

for i in range(len(data['ones'])):
    row = []
    for k in data.keys():
        row.append(data[k][i])
    df = df.append(pd.Series(row, index=df.columns), ignore_index=True)

df_check = pd.DataFrame(data)
df_check["result"] = df_check["zeros"] / df_check["total"]

df["result"] = df["zeros"] / df["total"]
df

If you try to run this, you'll see that all work as expected with df_check and the code fails when it get to df["result"] = df["zeros"] / df["total"]:

ValueError: Cannot set a DataFrame with multiple columns to the single column result

In fact, If I try to inspect the result of the division I notice there are two columns with all missing values:

>>> df["zeros"] / df["total"]

    total   zeros
0   NaN NaN
1   NaN NaN
2   NaN NaN
3   NaN NaN

Any suggestion why this happens and how to fix it?

Luca Clissa
  • 810
  • 2
  • 7
  • 27

3 Answers3

2

You logic to set up the dataframe is incorrect, don't use a loop, directly go for the DataFrame constructor, optionally with an extra step to rename the columns:

df = pd.DataFrame(data).rename(columns={'dataset': 'image_name'})
df["result"] = df["zeros"] / df["total"]

Output:

  image_name partition    zeros   ones    total    result
0    177.png     green  1896715  23285  1920000  0.987872
1    276.png     green  1914720   5280  1920000  0.997250
2    208.png     green  1913894   6106  1920000  0.996820
3    282.png     green  1910815   9185  1920000  0.995216

With your current approach you end up with a MultiIndex with a single level, which causes the further issue (slicing df['zeros'] and df["total"] gives you two DataFrames, not Series, and the division is not aligned).

print(df.columns)

MultiIndex([('image_name',),
            ( 'partition',),
            (     'zeros',),
            (      'ones',),
            (     'total',)],
           )

In any case append is now deprecated.

mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks a lot! your suggestion helped me solving the problem. The split method already creates a list, so I was passing a list with one list as columns. – Luca Clissa Jun 13 '23 at 08:14
  • @Luca exactly, but honestly don't do this, a loop+append is really not the right approach ;) If you want to set up the column names manually just do it after: `df = pd.DataFrame(data) ; df.columns = 'image_name partition zeros ones total'.split()` ;) – mozway Jun 13 '23 at 08:22
  • The fact is I need to loop to get the data. So I can "append" data at each iteration to some lists and then create the dataframe all at once at the end, is that what you mean? because otherwise I wouldn't know how to get the data... – Luca Clissa Jun 13 '23 at 09:06
  • 1
    Then better loop to create the dictionary, **then** pass it once to the DataFrame constructor. See [my answer](https://stackoverflow.com/a/75956237/16343464) on the deprecation of `append` and how to replace it. – mozway Jun 13 '23 at 09:13
  • I see, thanks a lot for sharing! – Luca Clissa Jun 13 '23 at 11:15
1

The problem is the following line

df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])

the split() method create a list itself, so avoid the list and use the following

df = pd.DataFrame(columns='image_name partition zeros ones total'.split())
Dejene T.
  • 973
  • 8
  • 14
0

I actually solved the issue thanks to the suggestion in @mozway answer.

Indeed the problem is in the fact that the bugged version has a MultiIndex. However, this is due to how I specify columns list and not due to the append method per-se. It solved changing from

df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])

to

df = pd.DataFrame(columns=["image_name", "partition", "zeros", "ones", "total"])

or even just columns='image_name partition zeros ones total'.split().

Luca Clissa
  • 810
  • 2
  • 7
  • 27
  • Problem is `df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])` return `df = pd.DataFrame(columns=[["image_name", "partition", "zeros", "ones", "total"]])`, need `df = pd.DataFrame(columns='image_name partition zeros ones total'.split())` – jezrael Jun 13 '23 at 08:14