0

I have a pandas dataframe df that I want to reorganize by column names and elements to create and new dataframe df. For example

import numpy as np
import pandas as pd

np.random.seed(10)

df = pd.DataFrame()
df['date'] = pd.date_range(start='2021-01-01', end='2021-04-01', freq='D')
df = df.set_index('date')
df['speciesA'] = np.random.randint(2, size=len(df))
df['speciesB'] = np.random.randint(3, size=len(df))

# new df0, set table elements as columns
df0 = pd.DataFrame()
df0['date'] = df.index
df0 = df0.set_index('date')

for k in df.columns.tolist(): # iterate over columns

    for jj in range(len(df)): # iterate over each row
        cell = df.iloc[jj] # cell value
    
        # iterate over table elements
        newcolumnname = str(k)+'_'+str(cell[k])
        df0[newcolumnname] = 0

        df0.iloc[jj][newcolumnname] = 1
        #df0.iloc[jj][newcolumnname] = df.iloc[jj][str(k)]

print(df0.head())

where the original dataframe df has the form

            speciesA  speciesB
date                          
2021-01-01         1         2
2021-01-02         1         0
2021-01-03         0         2
2021-01-04         1         2
2021-01-05         0         0

I want to create a new dataframe df0,

            speciesA_1  speciesA_0  speciesB_2  speciesB_0  speciesB_1
date                                                                  
2021-01-01           1           0           1           0           0
2021-01-02           1           0           0           1           0
2021-01-03           0           1           1           0           0
2021-01-04           1           0           1           0           0
2021-01-05           0           1           0           1           0

Note that df0 column names (eg. speciesA_1) consist of df column name speciesA and element value 1. So the corresponding df0 elements indicate True/False.

My code above gives an error A value is trying to be set on a copy of a slice from a DataFrame. I don't understand why this is happening or how to fix it.

Medulla Oblongata
  • 3,771
  • 8
  • 36
  • 75
  • 1
    It seems like not using the loop at all and just doing `df0 = pd.get_dummies(df0, columns=['speciesA', 'speciesB'])` like [this answer](https://stackoverflow.com/a/40963480/15497888) would be the way to go if you're just trying to one hot encode this DataFrame. Is that the primary goal? – Henry Ecker Jan 31 '22 at 21:35
  • 1
    yes that's exactly what I wanted, thanks – Medulla Oblongata Jan 31 '22 at 21:39

0 Answers0