-1

How to convert Word extracted text to string. The data are stored in the database like this:

  2,2 kW, 1500/1800, 400-440V, 50/60Hz, IP55, Iso.F
 {\rtf1\fbidis\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\fprq2\fcharset0 Arial;}{\f1\fswiss\fprq2\fcharset0 Calibri;}{\f2\fnil\fcharset0 Arial;}}
\viewkind4\uc1\pard\ltrpar\f0\fs20 8APE100L-4K-IE3\par
2,2 KW,   4-polig,    230/400V,   50Hz,   B5/A250,   IP55\par
\f1\fs22  \f0\fs20\par
Neutrales Zusatztypenschild mit folgenden Angaben:\par
2,2 kW, 400-440V/Y, 50 Hz,   1465min-1\par
2,2 kW, 400-440V/Y, 60 Hz,   1760min-1\f2\fs20\par
}

I want to convert it to be like this:

enter image description here

  • You would have to find out what specific standard Word is using, whether it's proprietary, and whether there is an open source C# library that can help you with that. Is it really necessary? Can't you copy & paste the text out of the word file, or save it in another format such as .txt? – Andreas Bonini Sep 24 '21 at 12:45
  • Maybe have a look here: https://stackoverflow.com/questions/15065053/read-a-word-document-using-c-sharp – Legit007 Sep 24 '21 at 12:46
  • 3
    That's not Word, that's RTF. Word is a *ZIP* package containing XML with a well known format. What you posted is neither a ZIP file nor XML\ – Panagiotis Kanavos Sep 24 '21 at 12:47
  • 1
    @ThomasBonini the Word format is well defined (ZIP containing XML files), with an SDK and several open source libraries. There's no need for copy pasting. As for text, that loses all formatting. That's like asking people to replace HTML with plain text. In any case, that's not Word – Panagiotis Kanavos Sep 24 '21 at 12:49
  • 1
    WinForms and WPF had a RichText control from the very first version. You didn't explain what you want to do with that RTF document (Display? Edit? Extract the text?), so the RTF control may be what you need. Worst case, you can use the RTF control to load the document and read the plain text – Panagiotis Kanavos Sep 24 '21 at 12:51
  • 1
    What are you trying to do? What kind of application are you building? In WPF you can use the `FlowDocument` and `TextRange` classes to load RTF without displaying anything. – Panagiotis Kanavos Sep 24 '21 at 12:54
  • If you search NuGet for `RTF` or `Docx` you'll find libraries that can handle both formats. – Panagiotis Kanavos Sep 24 '21 at 13:34
  • It's impossible to post a good answer without knowing what you want. Perhaps what you want is already available in your stack (WinForms, WPF). Perhaps you need one of the 100+ libraries that appear when you search for RTF in NuGet. – Panagiotis Kanavos Sep 24 '21 at 13:37

1 Answers1

-1
            var application = new Microsoft.Office.Interop.Word.Application();
            var document = new Microsoft.Office.Interop.Word.Document();

            document = application.Documents.Add(Template: @"C:\path");
            Console.WriteLine(document.Range().Text);

You will need to add reference to COM Microsoft Word 16.0 Object Library in your Project ( 16.0 is the version which should be the same as your Microsoft Word version).

And if you already have a different method for reading Word file, then just try .Range().Text.

Hadi Hoteit
  • 80
  • 1
  • 12
  • 1
    That's like killing a mosquito with a howitzer. Besides, that's RTF, not Word – Panagiotis Kanavos Sep 24 '21 at 13:18
  • It's for `.docx` documents, the library name contains the word `Word` inside of it. And in addition I tried the code snippet locally and it works fine on reading Word files and printing them in readable format. – Hadi Hoteit Sep 24 '21 at 13:22
  • This requires that that exact Word version is actually installed and that the application (which the OP didn't specify) can use it (web applications are explicitly not supported, for instance) – Hans Kesting Sep 24 '21 at 13:25
  • Visual Studio installs dependencies compatible with your used softwares, he should only have 1 version of that library which is the same version compatible with his Microsoft Word, but you are right, I should've mentioned the version part, i'll edit my post. – Hadi Hoteit Sep 24 '21 at 13:31
  • 1
    The question is about RTF, not Word. Both WinForms and WPF have RtfTextBox controls so they don't need to install anything extra. And even for `docx`, using Word is the last resort. The Open XML SDK can be used to open such files, even if it's not the friendliest library – Panagiotis Kanavos Sep 24 '21 at 13:33
  • 1
    Finally, searching for `docx` in NuGet returns 275 results. Some of these libraries are commercial, some are not that good, but none of them requires installing Word – Panagiotis Kanavos Sep 24 '21 at 13:36
  • I just Noticed the tag `RTF` on the post, i saw Word inside his body text before and I just assumed he's reading data from a Word file. I'm not exactly sure if the `Word` library would work the same on `RTF`, I don't have much information on that. If you're sure it doesn't work inform me and I'll delete my answer. – Hadi Hoteit Sep 24 '21 at 13:41
  • I added the `RTF` tag because that content is *not* Word. Word files are binary, specifically ZIP files containing XML. The content even contains `{\rtf1\fbidis\ansi\...` which specifies the codepage, language, fonts used etc – Panagiotis Kanavos Sep 24 '21 at 14:27