1

I have an Excel file with 100 sheets. I need to extract data from each sheets column P beginning from row 7 & create a new file with all extracted data in same column. In my Output file, the data is located in different column, ie(Sheet 2's data in column R, Sheet 3's in column B)

How can I make the data in the same column in the new Output excel? Thank you.

ps. Combining all sheets' column P data into a single column in single sheet is enough for me

import pandas as pd
import os 
Flat_Price = "Flat Pricing.xlsx"
dfs = pd.read_excel(Flat_Price, sheet_name=None, usecols = "P", skiprows=6, indexcol=1, sort=False)
df = pd.concat(dfs)
print(df)
writer = pd.ExcelWriter("Output.xlsx")
df.to_excel(writer, "Sheet1")
writer.save()
print (os.path.abspath("Output.xlsx"))
Anony Yam
  • 15
  • 3
  • Are data confidental? – jezrael Oct 10 '18 at 06:07
  • You'd have to decide on an organization scheme. It's unlikely that we can be of much help given the information provided. – Mad Physicist Oct 10 '18 at 06:09
  • I think you can use `pd.ExcelFile()` function. Read each sheet individually into a separate DF and append them later into one single DF in one single column. You can take help from this link: https://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook – Mayank Porwal Oct 10 '18 at 06:21
  • Thanks for the comment, I made some changes in my question. The data is not private but I am not so sure if I can post it here. I am new to Stackoverflow. – Anony Yam Oct 10 '18 at 06:22
  • @AnonyYam - OK, is possible share input data file via fropbox, gdocs, similar ? Because output of `dfs` should be ordered dictionary, so not sure why is ordering different in output. – jezrael Oct 10 '18 at 06:24
  • @jezrael File Available here( I deleted Sheet1 , 104&105 becoz they are unrelated) : https://www.dropbox.com/s/wzchdn18aimqol4/Salesdata%20-%20NOVUM%20WEST.xlsx?dl=0 – Anony Yam Oct 10 '18 at 06:29

1 Answers1

0

You need parameter header=None for default 0 column name:

dfs = pd.read_excel(Flat_Price, 
                    sheet_name=None, 
                    usecols = "P", 
                    skiprows=6, 
                    indexcol=1, 
                    header=None)

Then is possible extract number from first level of MultiIndex, convert to integer and sorting by sort_index:

df =df.set_index([df.index.get_level_values(0).str.extract('(\d+)',expand=False).astype(int), 
                  df.index.get_level_values(1)]).sort_index()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    It works, thanks a lot. I am using python for some specific tasks without foundation, thanks for the suggestion and I will try to find out what sort_index does =] – Anony Yam Oct 10 '18 at 06:52
  • @AnonyYam - yes, but first was neccesary create iinteger values. Btw, problem here was `concat` sorting, but because there was string `Table` in each sheetname, it was sorted by default lexicographically. – jezrael Oct 10 '18 at 06:54