0

I have the following data inputs that I am reading via pandas.

enter image description here

enter image description here

I want to take the cell 'Month Ending .....' and drop into a newly formed 'Date' Column and append the two input files together into one dataframe.

This is what I have tried so far...

import pandas as pd    
import glob 
import os 


### List Source Files That I need to Import###
path  = os.getcwd()
files = os.listdir(path)

### Loading Files by Variable ###

data = pd.DataFrame()


for files in glob.glob('../Sales_Master_Data/Sales_Data/* customer *.xls'): #searches for customer .xls files in the folder

    data = pd.read_excel(files,'sheet1',skiprows=0).fillna(method='ffill')#reads all files in df
    date = data.columns[4] # This is where the date value is located
    data['Date'] = date # Assigns date value to new ['Date'] column
    df  = df.append(data) # all files are appended together
    df.to_csv('Output.csv')

Unfortunately it produces the output below. All cols beginning with 'Month' need to be merged into 1 column and called ['Sales Qty'] and I'm also having trouble tiding up the column headers so that they are uniformenter image description here

Ideal output would look like this..... enter image description here

jwlon81
  • 339
  • 3
  • 15

1 Answers1

1

It is never a good idea to feed merged cells into pandas. First thing I would suggest is to flatten your inputs. If there is no easy way to do so and to answer your original question, you need to create a multiindex dataframe to handle your data best. This has already been covered in StackOverflow here: https://stackoverflow.com/a/27424102/9754169

Yuca
  • 6,010
  • 3
  • 22
  • 42