0

I have a code as mentioned below. I have a "inputText" variable which contains the string as well as some new line characters. Just assume my variable inputText contains something like "Test\nTest". So when I get final output I need to remove \n new line char from the string.

currentText =
        Encoding.UTF8.GetString(Encoding.Convert(
            Encoding.Default,
            Encoding.UTF8,
            Encoding.Default.GetBytes(inputText)));
Amedee Van Gasse
  • 7,280
  • 5
  • 55
  • 101
Sachin jeev
  • 201
  • 4
  • 14
  • 2
    `currentText = currentText.Replace("\n", string.Empty);` ? – Lasse V. Karlsen Jul 01 '16 at 10:50
  • If in case the string contains "\\n" then it should not do anything. – Sachin jeev Jul 01 '16 at 10:55
  • Why is this question tagged as an iText question? The answer has nothing to do with iText! – Bruno Lowagie Jul 01 '16 at 11:06
  • 1
    This is a very common code snippet that is copied and pasted all over from an iText text extraction sample somewhere but it is wrong and it will break! [In .Net, once you have a string, **you have a string**, and it is Unicode, **always**.](http://stackoverflow.com/a/10191879/231316) – Chris Haas Jul 01 '16 at 11:13
  • @Bruno that weird conversion is often seen in iTextSharp text extraction samples. Not the official ones but others, copied on and on without someone asking why it's there. – mkl Jul 01 '16 at 11:22
  • To quote Mr. T: *I pity the fools who don't read the official documentation* ;-) – Bruno Lowagie Jul 01 '16 at 12:10
  • @Chris I went through the link. What would you suggest for my problem. – Sachin jeev Jul 01 '16 at 12:20
  • The code snippet in the question is wrong. Any fixing of mangled strings should be done *before* it becomes a string. – Lasse V. Karlsen Jul 01 '16 at 12:37
  • You can either fix the content at the byte level before it becomes a string or you can hack something together once you've got a string. However, the code you posted is pretty much guaranteed to break things and should either just be removed or replaced with just `currentText = inputText`. Don't think of .Net strings in terms of bytes, that's a mistake. You can _ask_ .Net to convert a string into a byte array using a certain encoding but that's a conversion. Without seeing your actual string we can't really help you. – Chris Haas Jul 01 '16 at 13:31
  • Chris, I have a code like this `string outputText = PdfTextExtractor.GetTextFromPage(pdfReader, intPage + 1, strategy);` . How can I eliminate the new line char from the result? – Sachin jeev Jul 05 '16 at 10:52

1 Answers1

3

You can always remove '\n's from the currentText string using Replace, but if you would rather remove '\n' from the array of bytes coming into Convert method, before constructing a string object, you can filter it like this:

currentText = Encoding.UTF8.GetString(Encoding.Convert(
    Encoding.Default,
    Encoding.UTF8,
    Encoding.Default
        .GetBytes(inputText)
        .Where(b => b != '\n')
        .ToArray()
    )
);
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523