1

I am initialising a dataframe with lists, having followed the advice here. I then need to transpose the dataframe.

In the first example I take the column names from the lists used to initialise the dataframe.

In the second example I add the column names last.

-> Is there any difference between these examples?

-> Is there a standard or better way of naming columns of dataframes initialised like this?

p_id = ['a_1','a_2']
p = ['a','b']
p_id.insert(0,'p_id')
p.insert(0,'p')

df = pd.DataFrame([p_id, p])
df = df.transpose()
df.columns = df.iloc[0]
df = df[1:]
df

>>>
    p_id    p
0   a_1     a
1   a_2     b
p_id = ['a_1','a_2']
p = ['a','b']

df = pd.DataFrame([p_id, p])
df = df.transpose()
df.columns = ['p_id', 'p']
df

>>>
    p_id    p
0   a_1     a
1   a_2     b
doine
  • 336
  • 1
  • 12

1 Answers1

1

Yes, there is difference in indices:

print(df.equals(df1))
False

print (df.index)
RangeIndex(start=1, stop=3, step=1)

print (df1.index)
RangeIndex(start=0, stop=2, step=1)

print (df.index == df1.index)
[False False]

Solution is create defaul index in df by DataFrame.reset_index with drop=True parameter:

df = df.reset_index(drop=True)

print(df.equals(df1))
True

print (df.index)
RangeIndex(start=0, stop=2, step=1)

print (df1.index)
RangeIndex(start=0, stop=2, step=1)

print (df.index == df1.index)
[ True  True]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thanks. So when using a list of lists to initialise a df, either way of naming columns is fine, so long as I correct the index? Obviously my second example is neater and preferable, but I wanted to illustrate the point in the first example. – doine Apr 18 '23 at 13:48
  • 1
    @doine best is use `df = pd.DataFrame([p_id, p], index=['p_id', 'p']). transpose()` then is converted index to columns after transpose. – jezrael Apr 18 '23 at 15:04