3

1. Summarize the problem

I am trying to extract information from an excel file using a python script but I need to remove all of the strikethrough content first. The excel file often has strikethrough and non-strikethrough content in the same cell. Screenshot of example content. I would like the strikethrough content to be deleted from the cell, leaving only the non-strikethrough content behind.

2. Describe what you’ve tried

I am currently using a VBA script to remove the strikethrough content. I run this script by opening the excel file, creating a macro, running the macro. Then, I use pandas to read in the excel file as a dataframe. I would like to be able to run this all at once and avoid this manual process.

Does anyone know of a solution? for

  1. Removing all of the strikethrough content in the scenario described using python? Or;
  2. Creating a script that can add a macro (VBA script) and runs said macro on the excel file to remove the strikethrough content.

I have seen some suggested solutions using openpyxl but from what I've seen these cannot remove strikethrough content (leaving other content behind) from a cell that has strikethrough and non-strikethrough content.

Thanks everyone! This is my first stack overflow question so please let me know if more details are needed.

VBA Script for Removing Strikethrough content

Sub strikeWorksheetLoop()

 

   Dim WS_Count As Integer

   Dim I As Integer

 

   ' Set WS_Count equal to the number of worksheets in the active

   ' workbook.

   WS_Count = ActiveWorkbook.Worksheets.Count

 

   ' Begin the loop.

   For I = 1 To WS_Count

 

      ' Insert your code here.

        Dim testRange As Range, c As Range

        Set testRange = ActiveWorkbook.Worksheets(I).Range("C10").CurrentRegion

        For Each c In testRange

            c.Value = DelStrikethroughs(c)

        Next c

      ' The following line shows how to reference a sheet within

      ' the loop by displaying the worksheet name in a dialog box.

      ' MsgBox ActiveWorkbook.Worksheets(I).Name

 

   Next I

 

End Sub

 

 

Function DelStrikethroughs(Cell As Range) As String

    'Returns the text value of a cell with strikethrough characters removed

    Dim NewText As String

    Dim iCh As Long, l As Long, ch As Characters

 

    On Error Resume Next

    l = Cell.Characters.Count

    On Error GoTo 0

 

    If l = 0 Then

        NewText = Cell.Text

    Else

        For iCh = 1 To l

            Set ch = Cell.Characters(iCh, 1)

            NewText = NewText & IIf(ch.Font.Strikethrough, "", ch.Text)

        Next iCh

    End If

    DelStrikethroughs = NewText

End Function

Related SO post deleting records from an excel based on the format of the column

  • please provide clarification, is "delete" strikethrough mean removing the formatting and preserving the text or removing any text that has a a strikethrough? – Joe Thor Jun 07 '21 at 21:13
  • 2
    @JoeThor - the VBA removes any text containing a strikethrough. – BigBen Jun 07 '21 at 21:14
  • 1
    @JoeThor By "delete strikethrough" I mean remove any text that has a strikethrough from the cell. – PseudoFudge Jun 07 '21 at 22:54
  • i think this might help, but with xlrd: https://stackoverflow.com/questions/12371787/how-do-i-find-the-formatting-for-a-subset-of-text-in-an-excel-document-cell – meni181818 Jun 07 '21 at 23:01
  • 2
    Openpyxl only supports formatting for the whole cell, but as Pandas, which uses openpyxl internally, doesn't preserve text-formatting at all, you can just that. – Charlie Clark Jun 08 '21 at 08:33
  • @CharlieClark - Yea, the fact that openpyxl only supports formatting for the whole cell is the root of my issue. The problem with using pandas to read the excel file is that it effectively ignores the text-formatting of the entire cell so the resulting information is not in the format I want (it has both the regular content and the content that was striked still remaining). – PseudoFudge Jun 08 '21 at 20:38
  • Then you can solve the problem by using openpyxl directly. – Charlie Clark Jun 09 '21 at 12:53

0 Answers0