Why does pandas dataframe return 2 columns when only 1 is selected

Question

While creating some plots with matplotlib I found a strange behavior of pandas, when I select only 1 column it returns 2.

import pandas as pd
import io

data = io.StringIO("""time_0,1,time_1,2,time_2,0,time_3,3
-0.002,-0.1225,-0.002,-0.0904,-0.002,0.0331,-0.002,0.,
0.0,-0.1225,0.,-0.0904,0.,0.0331,0.,0.,
0.002,-0.1224,0.002,-0.0904,0.002,0.0331,0.002,0.,
0.004,-0.1225,0.004,-0.0904,0.004,0.0331,0.004,0.,""")

df = pd.read_csv(data)
print(df["time_0"])

Output:

-0.002 -0.1225
0.000 -0.1225
0.002 -0.1224
0.004 -0.1225
Name: time_0, dtype: float64

It shows values from both column "time_0" and "1", but only "time_0" was selected. Is this a bug or a feature?

Use `df = pd.read_csv(data, index_col=False)` like pointed @Ch3steR, because here are converted first column to `FloatIndex` — jezrael, Nov 23 '20 at 10:41

adir abargil · Accepted Answer · 2020-11-23T10:34:26.097

1

your dataframe returns only one line, but it also prionting the index which is same as the column "1"

df
Out[3]: 
        time_0      1  time_1      2  time_2      0  time_3   3
-0.002 -0.1225 -0.002 -0.0904 -0.002  0.0331 -0.002     0.0 NaN
 0.000 -0.1225  0.000 -0.0904  0.000  0.0331  0.000     0.0 NaN
 0.002 -0.1224  0.002 -0.0904  0.002  0.0331  0.002     0.0 NaN
 0.004 -0.1225  0.004 -0.0904  0.004  0.0331  0.004     0.0 NaN

it seems like it takes unintentionally the first column as index... it takes the last column as a nan value because of the extra , in each line....

try removing the ,:

 import pandas as pd
 import io
 
 data = io.StringIO("""time_0,1,time_1,2,time_2,0,time_3,3
 -0.002,-0.1225,-0.002,-0.0904,-0.002,0.0331,-0.002,0.
 0.0,-0.1225,0.,-0.0904,0.,0.0331,0.,0.
 0.002,-0.1224,0.002,-0.0904,0.002,0.0331,0.002,0.
 0.004,-0.1225,0.004,-0.0904,0.004,0.0331,0.004,0.""")
 
 df = pd.read_csv(data)
 print(df["time_0"])

this code will print

0   -0.002
1    0.000
2    0.002
3    0.004
Name: time_0, dtype: float64

edited Nov 23 '20 at 10:34

answered Nov 23 '20 at 10:31

adir abargil

5,495
3
19
29

1

I would also suggest you read your data like this: pd.read_csv(data, index_col=False) it will ignore the first column as the index and will automatically assign an index column – T.sagiv Nov 23 '20 at 10:40
but the input is incorrect this is the main issue... the column length dont fit the data... this is why pandasassumed the first columns is index... – adir abargil Nov 23 '20 at 10:41
pandas read_csv defaults to read the first column as index see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html the header data has 8 columns and also the data itself I think it's correct he just didnt assumed it will read it like that – T.sagiv Nov 23 '20 at 10:50

Why does pandas dataframe return 2 columns when only 1 is selected

1 Answers1