DataFrame not considering column A due to cell A1 is blank

Question

I'm trying to import all raw data (csv files) into one DataFrame, and since the raw data file have some useless lines, I like to delete them by "drop", however for the row with first column is a blank cell. I'm unable to delete it, and the dataframe doesn't recognize that column.

Here is my code:

import pandas as pd
import numpy as np   
import glob
import os

#Determine file path for index weighting files
pathwgt=r'//10.27.36.181/etf/Bill/Quant/AxJ_Weight'
filenames = glob.glob(pathwgt + "/*.csv")

#declare data frame
dfwgt=pd.DataFrame()

#consolidate all files into one data frame
for filename in filenames:
    dfwgt=dfwgt.replace('',np.NaN)
    dfwgt=dfwgt.append(pd.read_csv(filename))

dfwgt=dfwgt.drop(['Symbol','Company'])

now my cell A1 in excel is blank, and B1 has a string, where I like to delete the entire row 1. The dataFrame shape is [124544 rows x 6 columns], where it suppose to be [124544 rows x 7 columns]

Welcome to SO. Remember to add the appropriate language tag. — Maciej Jureczko, Sep 29 '17 at 08:25
Its unclear do you need to delete the column or the row? You dont need to drop the rows. Just select the rows ignoring what you dont want. — Bharath M Shetty, Sep 29 '17 at 08:32
Hi Jezrael, yes I can add the heading, however since dataframe only consider it as 6 columns, I'm unable to add the first column header... — Bill Sun, Sep 29 '17 at 08:40
@BillSun - Do you need `pd.read_csv(filename).reset_index()` ? It is really hard answering, because data depending problem. — jezrael, Sep 29 '17 at 09:01
Also quite confusing that you use the term `excel` but you use `pd.read_csv()` this function would rename the blank column to `'Unnamed: 0'` - lastly `dfwgt` is a shit variable name. — foxyblue, Sep 29 '17 at 12:32
@GiantsLoveDeathMetal, Yes, that's exactly my problem, I got 'Unnamed :0' in my output. You mentioned that I used pd.read_csv(), that's because my source files are in csv format. — Bill Sun, Oct 03 '17 at 03:47

score 0 · Answer 1 · answered Oct 03 '17 at 12:28

You have a few solutions to your problem:

Remove row and header:

pd.read_csv('data.csv', skiprows=1, header=None)

This create a DataFrame with column names identified by a number. (e.g. from 0 -> 3)

Drop column:

pd.read_csv('data.csv')

Will result in have the unnamed column being assigned the name 'Unnamed: 0' you can drop this column by doing the following:

df = df.drop('Unnamed: 0', axis=1)

Change name:

pd.read_csv('data.csv')

Alternatively:

df = df.rename(columns={'Unnamed: 0': 'new_name'}

If the above doesn't solve your problem, then I am struggling to understand your question.

DataFrame not considering column A due to cell A1 is blank

1 Answers1

Remove row and header:

Drop column:

Change name: