1. Summarize the problem
I am trying to extract information from an excel file using a python script but I need to remove all of the strikethrough content first. The excel file often has strikethrough and non-strikethrough content in the same cell. Screenshot of example content. I would like the strikethrough content to be deleted from the cell, leaving only the non-strikethrough content behind.
2. Describe what you’ve tried
I am currently using a VBA script to remove the strikethrough content. I run this script by opening the excel file, creating a macro, running the macro. Then, I use pandas to read in the excel file as a dataframe. I would like to be able to run this all at once and avoid this manual process.
Does anyone know of a solution? for
- Removing all of the strikethrough content in the scenario described using python? Or;
- Creating a script that can add a macro (VBA script) and runs said macro on the excel file to remove the strikethrough content.
I have seen some suggested solutions using openpyxl but from what I've seen these cannot remove strikethrough content (leaving other content behind) from a cell that has strikethrough and non-strikethrough content.
Thanks everyone! This is my first stack overflow question so please let me know if more details are needed.
VBA Script for Removing Strikethrough content
Sub strikeWorksheetLoop()
Dim WS_Count As Integer
Dim I As Integer
' Set WS_Count equal to the number of worksheets in the active
' workbook.
WS_Count = ActiveWorkbook.Worksheets.Count
' Begin the loop.
For I = 1 To WS_Count
' Insert your code here.
Dim testRange As Range, c As Range
Set testRange = ActiveWorkbook.Worksheets(I).Range("C10").CurrentRegion
For Each c In testRange
c.Value = DelStrikethroughs(c)
Next c
' The following line shows how to reference a sheet within
' the loop by displaying the worksheet name in a dialog box.
' MsgBox ActiveWorkbook.Worksheets(I).Name
Next I
End Sub
Function DelStrikethroughs(Cell As Range) As String
'Returns the text value of a cell with strikethrough characters removed
Dim NewText As String
Dim iCh As Long, l As Long, ch As Characters
On Error Resume Next
l = Cell.Characters.Count
On Error GoTo 0
If l = 0 Then
NewText = Cell.Text
Else
For iCh = 1 To l
Set ch = Cell.Characters(iCh, 1)
NewText = NewText & IIf(ch.Font.Strikethrough, "", ch.Text)
Next iCh
End If
DelStrikethroughs = NewText
End Function
Related SO post deleting records from an excel based on the format of the column