I am working on a test project using C# in Visual Studio 2019 Community edition.
I have a book in .rtf format. The chapter numbers are in Times New Roman 16 font.
Each chapter's verses are numbered in Arial 12 font.
I want to be able to programmatically remove each chapter's verse numbers, i.e. remove the Arial 12 numbering of the verses, i.e. Chapter 1 (1 blah blah, 2 blah, 3 blah < remove those 12 pt numbers, while leaving each chapter's numbers (the Times New Roman 16).
I am creating a text-to-speech App, which will read the .rtf book. I don't want each verse number to be read, just the chapter number, followed by the text.
Can anyone suggest how to interate through the document and remove the numbered fonts using either Word Interop, Regex, or other method?
Here is a sample of what I have tried so far, without success.
public string LocateFont(string myBook)
{
if (myBook.ToString() == "xxxxxx")
{
if (rtbox1.Text != "")
{
Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();
Document myDoc = wordApp.Documents.Open(rtbox1.Text);
Microsoft.Office.Interop.Word.Range range = myDoc.Range(0, myDoc.Content.End);
Regex reNum = new Regex(@"^\d+$");
bool isNumeric = reNum.Match(rtbox1.Text).Success;
if (isNumeric.Equals(true) & range.Find.Font.Name == "Arial" & range.Font.Size.Equals("12"))
{
range.Font.Equals("");
}
}
}
return rtbox1.Text.ToString();
}
OK, after reworking the code, I was finally able to get visual studio to recognize macropod's code suggestion, as shown below.
private void tsbtnDelFont_Click(object sender, EventArgs e)
{
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = new Document();
doc.Application.Documents.Add(rtbox1.Text);
word.ScreenUpdating = false;
Microsoft.Office.Interop.Word.Range range = word.ActiveDocument.Content;
var tempVar = range.Find;
tempVar.ClearFormatting();
tempVar.Font.Size = 12;
tempVar.Font.Name = "Arial";
tempVar.Text = "<[0-9]@>";
tempVar.Replacement.Text = "";
tempVar.MatchWildcards = true;
tempVar.Wrap = WdFindWrap.wdFindContinue;
tempVar.Execute(Microsoft.Office.Interop.Word.WdReplace.wdReplaceAll);
object filename = @"Path to my.rtf";
doc.SaveAs2(ref filename);
word.ScreenUpdating = true;
}
The code compiles, I populate the RichTextBox with my .rtf document, but when I run the code that is supposed to remove the Arial 12 point fonts, I receive a COM error that the RichTextBox String is longer than 255 characters.
Can anyone suggest how to get past this impasse?
After researching this issue further, I discovered that there is an inherit 255-character limit, which cannot be easily overcome. I found an interesting article that states that
Even find strings constructed using wildcards are limited to 255 characters.
My document contains over 40,000 words, which greatly exceeds 255 characters, and Words own Find cannot overcome this.
The article Find & Replace (w\ Long Strings) offers a VBA solution for overcoming this 255-character limitation. However, I tried unsuccessfully to incorporate macropod's code within the article's solution. If macropod, or someone else, after reading the article, can marry his code with the solution offered in the article, that would be great.
However, until then, I used a Find and Replace using the following special characters. ^#
.
This removed the Arial 12 point numeric fonts. However, it also removed the chapter numbers, which I will need to replace.
Here is the VBA macro I created inside of the rich text document, based on your suggestion. However, upon running this code, it does Nothing at all. It doesn't remove any Arial 12 point numbered verses at all.
Sub RemoveNumericVerses()
'
' RemoveNumericVerses Macro
'
'
Application.ScreenUpdating = False
With ActiveDocument.Range.Find
.ClearFormatting
.Font.Size = 12
.Font.Name = "Arial"
.Text = "<[0-9]@>"
.Replacement.Text = ""
.MatchWildcards = True
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceAll
End With
Application.ScreenUpdating = True
End Sub
I'm still open to suggestions, but creating and executing the aforementioned VBA macro does not work.
Joel Coehoorn, here is the excerpt that you asked for. If the formatting carries over, you should see the chapter numbers in Times New Roman 16 font, and the verses in Arial 12 font.
31 There they buried Abraham and his wife Sarah. There they buried Isaac and his wife Rebekʹah, and there I buried Leʹah. 32 The field and the cave that is in it were purchased from the sons of Heth.” 33 Thus Jacob finished giving these instructions to his sons. Then he drew his feet up onto the bed and breathed his last and was gathered to his people. Chapter 50 1 Joseph then threw himself on his father and wept over him and kissed him. 2 After that Joseph commanded his servants, the physicians, to embalm his father. So the physicians embalmed Israel, 3 and they took the full 40 days for him, for this is the full period for the embalming, and the Egyptians continued to shed tears for him 70 days.