-2

For example u have 1 excel file and it consist of 10000 data in it. Later when we import that excel file in pycharm or jupiter notebook. If i run that file i will get an Index range also know as Row labels. my python code should be able to read that ten thousand row labels and should be able to separate / split into 10 different excel sheet files which will have 1000 data in each of the 10 separated sheet. Other example is, if there is 9999 data in 1 sheet/file then my python code should divide 9000 data in 9 sheet and other 999 in other sheet without any mistakes.{This is important Question}

i am asking this because in my data there is not any unique values for my code to split the files using .unique

Neville Doke
  • 41
  • 1
  • 2
  • 5
  • Do share some examples by pasting your code and data as text. And share the codes you have tried in a [minimal and reproducible manner](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – greco May 04 '22 at 08:53

3 Answers3

4

You could use Pandas to read your file, chunk it then re-write it :

import pandas as pd

df = pd.read_excel("/path/to/excels/file.xlsx")

n_partitions = 3

for i in range(n_partitions):
    sub_df = df.iloc[(i*n_paritions):((i+1)*n_paritions)]
    sub_df.to_excel(f"/output/path/to/test-{i}.xlsx", sheet_name="a")

EDIT: Or if you prefere to set the number of lines per xls files :

import pandas as pd

df = pd.read_excel("/path/to/excels/file.xlsx")

rows_per_file = 4

n_chunks = len(df) // rows_per_file

for i in range(n_chunks):
    start = i*rows_per_file
    stop = (i+1) * rows_per_file
    sub_df = df.iloc[start:stop]
    sub_df.to_excel(f"/output/path/to/test-{i}.xlsx", sheet_name="a")
if stop < len(df):
    sub_df = df.iloc[stop:]
    sub_df.to_excel(f"/output/path/to/test-{i}.xlsx", sheet_name="a")

You'll need openpyxl to read/write Excel files

fweber
  • 355
  • 1
  • 2
  • 10
  • The 1st one was now working, and the 2 was good and it also worked at a point where my 9999 data was in a separate excel file with 1000 data in each excel file, but the problem went where the remaining 999 data was not printed in another excel file coz of Floor division that is // i guess. – Neville Doke May 04 '22 at 11:48
  • Event with the last condition `if stop < len(df)` ? This was here to deal with these last 999 rows – fweber May 04 '22 at 13:47
  • ok got uh......... – Neville Doke May 09 '22 at 06:20
1

the following code snippet is working fine for me

import pandas as pd
import openpyxl
import math

data =  pd.read_excel(r"path_to_excel_file.xlsx")

_row_range = 200
_block = math.ceil(len(data)/_row_range )

for x in range(_block,_row_range ):
    startRow = x*_row_range 
    endRow = (x+1)*_row_range 
    _data = data.iloc[startRow:endRow]
    _data.to_excel(f"file_name_{x}.xlsx",sheet_name="Sheet1",index=False)
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 16 '22 at 07:31
1

This gets the job done as well. Assumes the Excels file would be 19000 rows per file. Edit that to suit your scenario.

import pandas as pd 
import math 

data = pd.read_excel(filename) 
count = len(data) 
rows_per_file = 19000 
no_of_files = math.ciel(count/rows_per_file) 
start_row = 0 
end_row = rows_per_file 

for x in range(no_of_files):
      new_data = data.iloc(start_row:end_row) 
      newdata.to_excel(f"filename_{x}.xlsx")
      start_row end_row + 1 
      end_row = end_row + rows_per_file  
bool3max
  • 2,748
  • 5
  • 28
  • 57
Bankyiie
  • 11
  • 3