0

I'm trying to import all raw data (csv files) into one DataFrame, and since the raw data file have some useless lines, I like to delete them by "drop", however for the row with first column is a blank cell. I'm unable to delete it, and the dataframe doesn't recognize that column.

Here is my code:

import pandas as pd
import numpy as np   
import glob
import os

#Determine file path for index weighting files
pathwgt=r'//10.27.36.181/etf/Bill/Quant/AxJ_Weight'
filenames = glob.glob(pathwgt + "/*.csv")

#declare data frame
dfwgt=pd.DataFrame()

#consolidate all files into one data frame
for filename in filenames:
    dfwgt=dfwgt.replace('',np.NaN)
    dfwgt=dfwgt.append(pd.read_csv(filename))

dfwgt=dfwgt.drop(['Symbol','Company'])

now my cell A1 in excel is blank, and B1 has a string, where I like to delete the entire row 1. The dataFrame shape is [124544 rows x 6 columns], where it suppose to be [124544 rows x 7 columns]

Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
Bill Sun
  • 51
  • 1
  • 2
  • Can you add data samples, 3-4 rows and desired output? – jezrael Sep 29 '17 at 08:22
  • Welcome to SO. Remember to add the appropriate language tag. – Maciej Jureczko Sep 29 '17 at 08:25
  • Its unclear do you need to delete the column or the row? You dont need to drop the rows. Just select the rows ignoring what you dont want. – Bharath M Shetty Sep 29 '17 at 08:32
  • Hi Jezrael, yes I can add the heading, however since dataframe only consider it as 6 columns, I'm unable to add the first column header... – Bill Sun Sep 29 '17 at 08:40
  • @BillSun - Do you need `pd.read_csv(filename).reset_index()` ? It is really hard answering, because data depending problem. – jezrael Sep 29 '17 at 09:01
  • Also quite confusing that you use the term `excel` but you use `pd.read_csv()` this function would rename the blank column to `'Unnamed: 0'` - lastly `dfwgt` is a shit variable name. – foxyblue Sep 29 '17 at 12:32
  • @GiantsLoveDeathMetal, Yes, that's exactly my problem, I got 'Unnamed :0' in my output. You mentioned that I used pd.read_csv(), that's because my source files are in csv format. – Bill Sun Oct 03 '17 at 03:47

1 Answers1

0

You have a few solutions to your problem:

Remove row and header:

pd.read_csv('data.csv', skiprows=1, header=None)

This create a DataFrame with column names identified by a number. (e.g. from 0 -> 3)


Drop column:

pd.read_csv('data.csv')

Will result in have the unnamed column being assigned the name 'Unnamed: 0' you can drop this column by doing the following:

df = df.drop('Unnamed: 0', axis=1)

Change name:

pd.read_csv('data.csv')

Alternatively:

df = df.rename(columns={'Unnamed: 0': 'new_name'}

If the above doesn't solve your problem, then I am struggling to understand your question.

foxyblue
  • 2,859
  • 2
  • 21
  • 29