66

I currently have this code. It works perfectly.

It loops through excel files in a folder, removes the first 2 rows, then saves them as individual excel files, and it also saves the files in the loop as an appended file.

Currently the appended file overwrites the existing file each time I run the code.

I need to append the new data to the bottom of the already existing excel sheet ('master_data.xlsx)

dfList = []
path = 'C:\\Test\\TestRawFile' 
newpath = 'C:\\Path\\To\\New\\Folder'

for fn in os.listdir(path): 
  # Absolute file path
  file = os.path.join(path, fn)
  if os.path.isfile(file): 
    # Import the excel file and call it xlsx_file 
    xlsx_file = pd.ExcelFile(file) 
    # View the excel files sheet names 
    xlsx_file.sheet_names 
    # Load the xlsx files Data sheet as a dataframe 
    df = xlsx_file.parse('Sheet1',header= None) 
    df_NoHeader = df[2:] 
    data = df_NoHeader 
    # Save individual dataframe
    data.to_excel(os.path.join(newpath, fn))

    dfList.append(data) 

appended_data = pd.concat(dfList)
appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))

I thought this would be a simple task, but I guess not. I think I need to bring in the master_data.xlsx file as a dataframe, then match the index up with the new appended data, and save it back out. Or maybe there is an easier way. Any Help is appreciated.

Martin Müsli
  • 1,031
  • 3
  • 14
  • 26
brandog
  • 1,497
  • 5
  • 20
  • 28

11 Answers11

119

UPDATE [2022-01-08]: starting from version 1.4.0 Pandas supports appending to existing Excel sheet, preserving the old contents, "out of the box"!

Good job Pandas Team!

Excerpt from the ExcelWriter documentation:

if_sheet_exists : {'error', 'new', 'replace', 'overlay'}, default 'error'
    How to behave when trying to write to a sheet that already
    exists (append mode only).
    ...
    * overlay: Write contents to the existing sheet without removing the old contents.

    .. versionadded:: 1.3.0
    .. versionchanged:: 1.4.0
       Added ``overlay`` option

For Pandas versions < 1.4.0 please find below a helper function for appending a Pandas DataFrame to an existing Excel file.

If an Excel file doesn't exist then it will be created.


UPDATE [2021-09-12]: fixed for Pandas 1.3.0+

The following functions have been tested with:

  • Pandas 1.3.2
  • OpenPyxl 3.0.7

from pathlib import Path
from copy import copy
from typing import Union, Optional
import numpy as np
import pandas as pd
import openpyxl
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter


def copy_excel_cell_range(
        src_ws: openpyxl.worksheet.worksheet.Worksheet,
        min_row: int = None,
        max_row: int = None,
        min_col: int = None,
        max_col: int = None,
        tgt_ws: openpyxl.worksheet.worksheet.Worksheet = None,
        tgt_min_row: int = 1,
        tgt_min_col: int = 1,
        with_style: bool = True
) -> openpyxl.worksheet.worksheet.Worksheet:
    """
    copies all cells from the source worksheet [src_ws] starting from [min_row] row
    and [min_col] column up to [max_row] row and [max_col] column
    to target worksheet [tgt_ws] starting from [tgt_min_row] row
    and [tgt_min_col] column.

    @param src_ws:  source worksheet
    @param min_row: smallest row index in the source worksheet (1-based index)
    @param max_row: largest row index in the source worksheet (1-based index)
    @param min_col: smallest column index in the source worksheet (1-based index)
    @param max_col: largest column index in the source worksheet (1-based index)
    @param tgt_ws:  target worksheet.
                    If None, then the copy will be done to the same (source) worksheet.
    @param tgt_min_row: target row index (1-based index)
    @param tgt_min_col: target column index (1-based index)
    @param with_style:  whether to copy cell style. Default: True

    @return: target worksheet object
    """
    if tgt_ws is None:
        tgt_ws = src_ws

    # https://stackoverflow.com/a/34838233/5741205
    for row in src_ws.iter_rows(min_row=min_row, max_row=max_row,
                                min_col=min_col, max_col=max_col):
        for cell in row:
            tgt_cell = tgt_ws.cell(
                row=cell.row + tgt_min_row - 1,
                column=cell.col_idx + tgt_min_col - 1,
                value=cell.value
            )
            if with_style and cell.has_style:
                # tgt_cell._style = copy(cell._style)
                tgt_cell.font = copy(cell.font)
                tgt_cell.border = copy(cell.border)
                tgt_cell.fill = copy(cell.fill)
                tgt_cell.number_format = copy(cell.number_format)
                tgt_cell.protection = copy(cell.protection)
                tgt_cell.alignment = copy(cell.alignment)
    return tgt_ws


def append_df_to_excel(
        filename: Union[str, Path],
        df: pd.DataFrame,
        sheet_name: str = 'Sheet1',
        startrow: Optional[int] = None,
        max_col_width: int = 30,
        autofilter: bool = False,
        fmt_int: str = "#,##0",
        fmt_float: str = "#,##0.00",
        fmt_date: str = "yyyy-mm-dd",
        fmt_datetime: str = "yyyy-mm-dd hh:mm",
        truncate_sheet: bool = False,
        storage_options: Optional[dict] = None,
        **to_excel_kwargs
) -> None:
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param max_col_width: maximum column width in Excel. Default: 40
    @param autofilter: boolean - whether add Excel autofilter or not. Default: False
    @param fmt_int: Excel format for integer numbers
    @param fmt_float: Excel format for float numbers
    @param fmt_date: Excel format for dates
    @param fmt_datetime: Excel format for datetime's
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param storage_options: dict, optional
        Extra options that make sense for a particular storage connection, e.g. host, port,
        username, password, etc., if using a URL that will be parsed by fsspec, e.g.,
        starting “s3://”, “gcs://”.
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('/tmp/test.xlsx', df, autofilter=True,
                           freeze_panes=(1,0))

    >>> append_df_to_excel('/tmp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('/tmp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('/tmp/test.xlsx', df, sheet_name='Sheet2',
                           index=False, startrow=25)

    >>> append_df_to_excel('/tmp/test.xlsx', df, index=False,
                           fmt_datetime="dd.mm.yyyy hh:mm")

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    def set_column_format(ws, column_letter, fmt):
        for cell in ws[column_letter]:
            cell.number_format = fmt
    filename = Path(filename)
    file_exists = filename.is_file()
    # process parameters
    # calculate first column number
    # if the DF will be written using `index=True`, then `first_col = 2`, else `first_col = 1`
    first_col = int(to_excel_kwargs.get("index", True)) + 1
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')
    # save content of existing sheets
    if file_exists:
        wb = load_workbook(filename)
        sheet_names = wb.sheetnames
        sheet_exists = sheet_name in sheet_names
        sheets = {ws.title: ws for ws in wb.worksheets}

    with pd.ExcelWriter(
        filename.with_suffix(".xlsx"),
        engine="openpyxl",
        mode="a" if file_exists else "w",
        if_sheet_exists="new" if file_exists else None,
        date_format=fmt_date,
        datetime_format=fmt_datetime,
        storage_options=storage_options
    ) as writer:
        if file_exists:
            # try to open an existing workbook
            writer.book = wb
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)
            # copy existing sheets
            writer.sheets = sheets
        else:
            # file doesn't exist, we are creating a new one
            startrow = 0

        # write out the DataFrame to an ExcelWriter
        df.to_excel(writer, sheet_name=sheet_name, **to_excel_kwargs)
        worksheet = writer.sheets[sheet_name]

        if autofilter:
            worksheet.auto_filter.ref = worksheet.dimensions

        for xl_col_no, dtyp in enumerate(df.dtypes, first_col):
            col_no = xl_col_no - first_col
            width = max(df.iloc[:, col_no].astype(str).str.len().max(),
                        len(df.columns[col_no]) + 6)
            width = min(max_col_width, width)
            column_letter = get_column_letter(xl_col_no)
            worksheet.column_dimensions[column_letter].width = width
            if np.issubdtype(dtyp, np.integer):
                set_column_format(worksheet, column_letter, fmt_int)
            if np.issubdtype(dtyp, np.floating):
                set_column_format(worksheet, column_letter, fmt_float)

    if file_exists and sheet_exists:
        # move (append) rows from new worksheet to the `sheet_name` worksheet
        wb = load_workbook(filename)
        # retrieve generated worksheet name
        new_sheet_name = set(wb.sheetnames) - set(sheet_names)
        if new_sheet_name:
            new_sheet_name = list(new_sheet_name)[0]
        # copy rows written by `df.to_excel(...)` to
        copy_excel_cell_range(
            src_ws=wb[new_sheet_name],
            tgt_ws=wb[sheet_name],
            tgt_min_row=startrow + 1,
            with_style=True
        )
        # remove new (generated by Pandas) worksheet
        del wb[new_sheet_name]
        wb.save(filename)
        wb.close()

Old version (tested with Pandas 1.2.3 and Openpyxl 3.0.5):

import os
from openpyxl import load_workbook


def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                           index=False, startrow=25)

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    # Excel file doesn't exist - saving and exiting
    if not os.path.isfile(filename):
        df.to_excel(
            filename,
            sheet_name=sheet_name, 
            startrow=startrow if startrow is not None else 0, 
            **to_excel_kwargs)
        return
    
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

    # try to open an existing workbook
    writer.book = load_workbook(filename)
    
    # get the last row in the existing Excel sheet
    # if it was not specified explicitly
    if startrow is None and sheet_name in writer.book.sheetnames:
        startrow = writer.book[sheet_name].max_row

    # truncate sheet
    if truncate_sheet and sheet_name in writer.book.sheetnames:
        # index of [sheet_name] sheet
        idx = writer.book.sheetnames.index(sheet_name)
        # remove [sheet_name]
        writer.book.remove(writer.book.worksheets[idx])
        # create an empty sheet [sheet_name] using old index
        writer.book.create_sheet(sheet_name, idx)
    
    # copy existing sheets
    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

Usage examples:

filename = r'C:\OCC.xlsx'

append_df_to_excel(filename, df)

append_df_to_excel(filename, df, header=None, index=False)

append_df_to_excel(filename, df, sheet_name='Sheet2', index=False)

append_df_to_excel(filename, df, sheet_name='Sheet2', index=False, startrow=25)

c:/temp/test.xlsx:

enter image description here

PS you may also want to specify header=None if you don't want to duplicate column names...

UPDATE: you may also want to check this old solution

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Hey thanks, I am to use this. (I am not actually only removing 2 rows of data, that was just a place holder for a large amount of formatting.) I need it to append to the bottom of the existing sheet, without the index line. – brandog Jun 28 '16 at 11:41
  • @brandog, then you'll need to use `header=None` and you'll have to count the current number of lines in the excel file and use it like this: `startrow=curr_count+1` – MaxU - stand with Ukraine Jun 28 '16 at 11:56
  • OH, whoops! I miss read. Yes this answers my question perfectly! Thanks – brandog Jun 28 '16 at 11:57
  • Actually, I was just working on it and realize you are correct. I need the startrow = curr_count+1, I am having trouble implementing that though, would you be able to edit the answer above to use that method? – brandog Jun 29 '16 at 07:53
  • If the excel file is very large, I only want to append , what should I do ? – Mithril Jul 31 '18 at 08:05
  • @Mithril, did you try provided solution? – MaxU - stand with Ukraine Jul 31 '18 at 08:50
  • I just wonder if your solution is efficient for large file . Because it looks like read the whole file at first. – Mithril Jul 31 '18 at 09:00
  • @Mithril, please let me know if you will find a solution which doesn’t read the whole file and allows to add data to existing sheet, not overwriting it - I couldn’t find such a solution. – MaxU - stand with Ukraine Jul 31 '18 at 09:03
  • 2
    it just overwrites the file in the specified index not appends to end of existing file. – keramat Jan 16 '19 at 15:26
  • brilliant - been searching for nearly a day for this exact solution, thanks – trock2000 Mar 02 '19 at 14:59
  • @trock2000, glad it helps :) – MaxU - stand with Ukraine Mar 02 '19 at 15:04
  • @MaxU - thank you for this function, but for some reason I get this error: AttributeError: 'list' object has no attribute 'find' -- is this an issue with openpyxl? – 0004 Mar 27 '19 at 00:09
  • Very helpful utility function, one small suggest: something on the lines of `if startrow is not None then header = False` so that the column headers are not repeated. Thanks – shanlodh May 11 '19 at 09:29
  • 1
    @shanlodh, glad it helps :) Thank you for your suggestion! I dont think it's a good idea to disable it this way, because it won't allow to add a DF with a header below existing one or if you want to keep existing header lines. But it can be easily achieved with the existing function: `append_df_to_excel(filename, df, startrow=N, header=False)` ;) – MaxU - stand with Ukraine May 11 '19 at 13:37
  • @MaxU Could you help me in this question https://stackoverflow.com/questions/57609397/appending-data-in-excel-under-matching-column-name – user1896796 Aug 22 '19 at 12:33
  • @brandog, how did you define curr_count to get it to work? – dfahsjdahfsudaf Apr 09 '20 at 00:42
  • Thank you! The old answer was what I needed! – MinionAttack Feb 03 '21 at 10:40
  • Thanks, this has the most helpful reply I've seen on this site. – JeffD Feb 27 '21 at 22:45
  • 2
    As of pandas 1.2.0 the code will create a problem (it works just fine up to 1.1.5), raising `BadZipFile` exception, because at the time when you instantiate `pd.ExcelWriter` it creates empty file with size 0 bytes and overwrites the existing file. One has to specify `mode='a'`. see https://stackoverflow.com/a/66585065/4046632 and https://stackoverflow.com/q/66471466/4046632 – buran Mar 11 '21 at 15:27
  • @buran, thank you!! I've updated the answer correspondingly – MaxU - stand with Ukraine Mar 11 '21 at 15:35
  • @MaxU: Amazing.. It worked for me finally! I wasted two days of my work time struggling with this hectic problem. Finally it is solved. I was always getting an error near 'load_workbook' command raising 'BadZipFile' error. Openpyxl always made my file corrupt. But now with your solution I don't face any such problems. Not sure about the reason, even though I am interested to know. But it worked. Saved my day. Thanks a lot Max! – Priya May 21 '21 at 07:46
  • Just out of curiosity.. What is the use of 'Truncate_sheet' here? We anyways are appending only on one sheet know. Is there any example where truncate_Sheet is needed – Priya May 21 '21 at 09:06
  • 1
    @Priya, glad it helps. :) There were changes in Pandas 1.2.0 which led to `BadZipFile` exception using the old implementation of my function, so I had to adapt it to newer Pandas versions. – MaxU - stand with Ukraine May 21 '21 at 10:16
  • 1
    @Priya, parameter `truncate_sheet=True` might be used in order to clean up the old contents of a sheet before writing a DataFrame to that sheet. – MaxU - stand with Ukraine May 21 '21 at 10:19
  • @MaxU: Thanks Max! I upvoted your answer, comments and also started following you in Linkedin :) – Priya May 21 '21 at 12:44
  • Hello, this answer has helped me a lot, but It's taking a lot of time to write the to the file as it has 1000+ entries. Is there anything I can do about it? (runs in 10 seconds without writing, with writing takes 30-40 seconds) using this https://stackoverflow.com/questions/25863381/python-fastest-way-to-write-pandas-dataframe-to-excel-on-multiple-sheets – Aytida Jul 10 '21 at 11:40
  • 1
    This solution works in pandas 1.2.3 but gives the following error in pandas 1.3.0: `ValueError: Sheet 'Sheet1' already exists and if_sheet_exists is set to 'error'.`. Looking into the sourcecode it appears that the parameter `if_sheet_exists` is new in 1.3.0. The behavior of the append mode seems to have changed to appending a new sheet instead of appending the data in the same sheet. Does anyone have a solution to this? – Chris Jul 22 '21 at 12:10
  • Thank you! what if the destination file is '.xlsb'? – Mit Jul 27 '21 at 14:59
  • I have the same problem with @Chris, any update on this? – Will Sep 04 '21 at 18:25
  • @Will. Unfortunately we have not had time to look into it. We remain at 1.2.3 for now. – Chris Sep 07 '21 at 08:10
  • `writer.sheets = dict((ws.title, ws) for ws in book.worksheets)` adding this before writing would solve the issue, FIY. I found it in another related questions. The authors stated: "ExcelWriter for some reason uses writer.sheets to access the sheet. If you leave it empty it will not know that sheet Main is already there and will create a new sheet.". – Will Sep 08 '21 at 06:24
  • @Chris, I've updated this function to make it compatible with Pandas 1.3.0+ - please check... – MaxU - stand with Ukraine Sep 12 '21 at 14:09
  • @Will, I've updated this function to make it compatible with Pandas 1.3.0+ - please check... – MaxU - stand with Ukraine Sep 12 '21 at 14:09
  • Anyone ran into `We found a problem with some content` when opening the file and is there a solution for it? – Erfan Nov 03 '21 at 22:39
  • Anyone ran into `AttributeError: property 'book' of 'OpenpyxlWriter' object has no setter ` – shishi1181 Aug 19 '23 at 02:09
25

If you aren't strictly looking for an excel file, then get the output as csv file and just copy the csv to a new excel file.

Note: this only works when you have less than 1000 columns since csv has a limit on the number of columns you can write.

df.to_csv('filepath', mode='a', index = False, header=None)

mode='a' means append.

This is a roundabout way it but works neat!

David
  • 535
  • 8
  • 13
6

Building on MaxU and others' code and comments but simplifying to only fix the bug with pandas ExcelWriter that causes to_excel to create a new sheet rather than append to an existing sheet in append mode.

As others have noted, to_excel uses the ExcelWriter.sheets property and this is not populated when by ExcelWriter.

Fix is a one liner, otherwise code is standard pandas approach as documented in to_excel.


    # xl_path is destination xlsx spreadsheet
    with pd.ExcelWriter(xl_path, 'openpyxl', mode='a') as writer:
        # fix line
        writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
        df.to_excel(writer, sheet_name)

ppoi
  • 61
  • 1
  • 1
  • Delete line: writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets). Existing sheets will be overwritten. – stansy May 28 '23 at 12:25
  • Throws an error writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets) AttributeError: can't set attribute – Aqua 4 Jun 07 '23 at 12:25
6

This worked for me

import os
import openpyxl
import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows

file = r"myfile.xlsx"

df = pd.DataFrame({'A': 1, 'B': 2})

# create excel file
if os.path.isfile(file):  # if file already exists append to existing file
    workbook = openpyxl.load_workbook(file)  # load workbook if already exists
    sheet = workbook['my_sheet_name']  # declare the active sheet 

    # append the dataframe results to the current excel file
    for row in dataframe_to_rows(df, header = False, index = False):
        sheet.append(row)
    workbook.save(file)  # save workbook
    workbook.close()  # close workbook
else:  # create the excel file if doesn't already exist
    with pd.ExcelWriter(path = file, engine = 'openpyxl') as writer:
        df.to_excel(writer, index = False, sheet_name = 'my_sheet_name')
quant82
  • 61
  • 1
  • 1
3
import pandas as pd
import openpyxl

workbook = openpyxl.load_workbook("test.xlsx")
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = workbook
writer.sheets = dict((ws.title, ws) for ws in workbook.worksheets)
data_df.to_excel(writer, 'Existing_sheetname')
writer.save()
writer.close()
  • 6
    Code dumps do not make for good answers. You should explain *how* and *why* this solves their problem. I recommend reading, "[How do I write a good answer?"](//stackoverflow.com/help/how-to-answer). This can help future users learn and eventually apply that knowledge to their own code. You are also likely to have positive feedback/upvotes from users, when the code is explained. – John Conde Feb 28 '21 at 13:23
  • 1
    Agreed, some explanation would be super helpful here! – Kris Mar 19 '21 at 13:53
  • 1
    It is a misleading answer. This code only changes the first row in the excel file .... Don't share before try and make sure about your code. – ati ince Jun 23 '21 at 09:24
3

If you use ExcelWriter on the sheet every time it is going to override the previous sheet and all that will be visible is the last data sheet you appended to the workbook. Instead you can maintain a counter that is 1 initially for which you need to initialize the excel sheet and add initial data using the existing approach of

writer = pd.ExcelWriter(output_file, engine='openpyxl')

df = pd.read_excel(output_file, sheet_name='TestSheet1')

or you can use the following approach i used. to load the workbook next time you want to use it or else file not find exception if you try to load it in the first case.

USage:

from bs4 import BeautifulSoup
import requests
import pandas as pd
from openpyxl import load_workbook

urls = ["http://millenniumcricketleague.com/Home/ShowTeam.aspx?tid=22",
        "http://millenniumcricketleague.com/Home/ShowTeam.aspx?tid=40"]
path = "F:\meta_1.xlsx"
writer = pd.ExcelWriter(path,engine='openpyxl')
counter = 1
for url in urls:
    table_data = []
    final = []
    html_content = requests.get(url).text
    soup = BeautifulSoup(html_content, "lxml")
    x = soup.find_all('table')
    for table in x[1:]:
        for tr in table.find_all("tr"):
            newrow = []
            for td in tr.find_all("td"):
                newrow.append(td.text.replace('\n', ' ').strip())
            table_data.append(newrow)
    df = pd.DataFrame(table_data)
    sheetname = 'Sheet%s' % counter
    if(counter!=1):
        writer.book = load_workbook(path)
    df.to_excel(writer, sheet_name=sheetname)
    counter = counter + 1
    writer.save()

NO need to close the excelwriter. its an automatic function. Will show you a warning if you define it explicitly

0

This question has been out here a while. The answer is ok, but I believe this will solve most peoples question.

simply use glob to access the files in a specific directory, loop through them, create a dataframe of each file, append it to the last one, then export to a folder. I also included commented out code to run through this with csvs.

import os
import pandas as pd
import glob

# put in path to folder with files you want to append
# *.xlsx or *.csv will get all files of that type
path = "C:/Users/Name/Folder/*.xlsx"
#path = "C:/Users/Name/Folder/*.csv"

# initialize a empty df
appended_data = pd.DataFrame()

#loop through each file in the path
for file in glob.glob(path):
    print(file)

    # create a df of that file path
    df = pd.read_excel(file, sheet_name = 0)
    #df = pd.read_csv(file, sep=',')

    # appened it
    appended_data = appended_data.append(df)

appended_data

# export the appeneded data to a folder of your choice
exportPath = 'C:/My/EXPORT/PATH/appended_dataExport.csv'
appended_data.to_csv(os.path.join(exportPath),index=False)
brandog
  • 1,497
  • 5
  • 20
  • 28
0

Complementing to @david, if you dont care the index and you can use .csv, this function helps to append any df to an existing csv

def append_df(self, path_file, df):
    with open(path_file, 'a+') as f:
        df.to_csv(f, header=f.tell() == 0, encoding='utf-8', index=False)

Notes:

a+ create the file if it doesnot exist

f.tell() == 0 add header if the first row

Alex Montoya
  • 4,697
  • 1
  • 30
  • 31
  • Your method needs to read all existing excel and collect data in df, then add a new row into df then write to excel. If you work with huge size excel, it is a pretty undesirable method could be :( – ati ince Jun 23 '21 at 09:29
0
from openpyxl import load_workbook
wb = load_workbook(filepath)
ws = wb["Sheet1"]
df = dataframe.values.tolist()
for i in range(len(df)):
    ws.append(df[i])
wb.save(filepath)
alalei
  • 1
0

Here below is very simple working example of appending data to existing excel file.

last_row = pd.read_excel("output.xlsx").index.stop + 1 
df1 = pd.DataFrame([["Aashutosh0012", "https://aashutosh.fly.dev"]], columns=["Header A", "Header B"])  
    with pd.ExcelWriter("output.xlsx", mode="a", engine="openpyxl", if_sheet_exists="overlay") as writer:
        df.to_excel(writer, startrow=last_row, index=False, header=False)

I tried many examples as listed above none of them was working, until unless you pass startrow parameter in the pd.ExcelWriter, it will overwrite the existing rows.

Below is function to create an excel file if it does not exists, else append data to existing excel file at the end.

import os
import pandas as pd

def save_to_excel(data: list, excel_file="output.xlsx"):
    headers = ["Header A", "Header B"]
    data_df = pd.DataFrame(data, columns=headers)    
    try:
        # Check if the file exists or not
        file_exists = os.path.isfile(excel_file)        
        # Try to append data to an existing Excel file or create a new one
        if file_exists:
            # get last row postion in existing excel file
            last_row = pd.read_excel(excel_file).index.stop + 1 
            with pd.ExcelWriter(excel_file, mode="a", if_sheet_exists="overlay") as writer:
                data_df.to_excel(writer, startrow = last_row, index=False, header=False)
                print(f'Data appended to file {excel_file}.')
        else:
            with pd.ExcelWriter(excel_file, mode="w") as writer:
                data_df.to_excel(writer, index=False)
                print(f'New file {excel_file} created and data saved.')                    
    except PermissionError:
        print(f'Error: Permission denied while trying to access {excel_file}.')
    except Exception as e:
        print(f'An error occurred: {e}')
Aashutosh Kumar
  • 615
  • 9
  • 13
-1

Append DataFrame to existing excel file

Use ExcelWriter to append DataFrame to an existing excel file. This is a simple approach and uses the existing library features.

with pd.ExcelWriter('existing_excel_file.xlsx',mode='a') as writer:  
    df.to_excel(writer, sheet_name='existing_sheet_name')

For detailed examples refer to pandas read Excel File with Examples

NNK
  • 1,044
  • 9
  • 24