-2

From what I see, every single string in C# is being compiled into the unicode.

aUnableToParseL:                        // DATA XREF: test.main__Main+38o
unicode <Unable to parse local IPv4s.>,0

I'd like to know, is there any way, to tell the compiler, to compile all the strings into the ASCII format? (1 byte per character)

mexikanoZ
  • 81
  • 11
  • Check out [http://stackoverflow.com/questions/5348844/how-to-convert-a-string-to-ascii](http://stackoverflow.com/questions/5348844/how-to-convert-a-string-to-ascii) – Kamran Shahid Sep 10 '16 at 16:07
  • It's not what I'm looking for. I am trying to find a solution on a final product - which is exe. Normally compiler stores strings in memory in the unicode format. I'm looking for a solution, on how to store them in ASCII format, so I can eventually safe several thousand bytes. – mexikanoZ Sep 10 '16 at 16:10
  • "so I can eventually safe several thousand bytes" -- what platform are you using that's so constrained that you care about a couple of KB in the _executable image_? – Roger Lipscombe Sep 10 '16 at 16:15
  • 2
    To answer to your main question, the answer seems to be no. You can't change the way the compiler stores strings. Here's the link to another similiar question: http://stackoverflow.com/questions/6550254/default-c-sharp-string-encoding – Tommaso Scalici Sep 10 '16 at 16:17
  • Roger Lipscombe: Does it really matter? If I'm able to shrink my code, optimize it, speed it up or whatever else that could come to mind, then I'm doing it. @TommasoScalici - thank you. Looks like it'll be better to get back to good old C.. – mexikanoZ Sep 10 '16 at 16:18
  • 1
    You can't do that. If you are bent on reducing a few thousand bytes your executable then don't store the strings to begin with and simply use a file ASCII formatted where you read the strings as needed. I'm really not sure why this would be necessary but you don't have any other option; it's not a compiler option you can modify. – InBetween Sep 10 '16 at 16:18
  • @InBetween - it's another possible way to go, but I also have to display several messages into the console. If I'll be able to write into the console through byte arrays, then it'll be great. Tho, it's also about plugins like MySQL, that requires all your queries to be stored in the string. – mexikanoZ Sep 10 '16 at 16:20
  • Why would you want to do that? Simply read the strings from the ASCII encoded file and print them out normally with `Console.Write(string s)` – InBetween Sep 10 '16 at 16:23
  • 1
    Well, @mexikanoZ, managed languages aren't the best choice when it comes to memory savage and performance, this is well known. For that kind of scenarios C/C++ are a more reasonable choice. – Tommaso Scalici Sep 10 '16 at 16:24
  • @InBetween - it's also about optimization. Reading strings from the files, would be slower than accessing them directly from memory. Then again, we've the problem of doubling work that will be accomplished by assembler. By this way, we're slowing the CPU's work. The best way of going through it, would be to store strings in the byte arrays, and then eventually work on them - where it's possible. So thank you for your previous idea. – mexikanoZ Sep 10 '16 at 16:26
  • 4
    This sounds like an [XY problem](http://meta.stackexchange.com/questions/66377/); you want to “eventually safe several thousand bytes” and have decided “to compile all the strings into the ASCII” to do that. Stop; it is extremely unlikely that will help. Tell us where this “safe several thousand bytes” requirement is coming from and we may be able to help. Depending on your data, compressing Unicode is far more likely to save bytes than using ASCII, but we don't know what your data is. – Dour High Arch Sep 10 '16 at 16:31
  • Long story short. I've wrote a game server for an old MMORPG. So far, I was able to minimize RAM and CPU usage to minimum. If you want to know the details, it's 50mb of RAM and around 0.28% CPU usage per tick with 100 players online. Now let me ask you a simple question. How do you think, what would be easier, and faster for CPU to do. Work on such string: "Hello world", which is stored as "48 65 6c 6c 6f 20 77 6f 72 6c 64" or "48 00 65 00 6c 00 6c 00 6f 00 20 00 77 00 6f 00 72 00 6c 00 64 00" - of course the first option. Sorry; when it comes to optimizations I simply turn into a psychopath. – mexikanoZ Sep 10 '16 at 16:43
  • 2
    In .NET and C# ,all strings are encoded using UTF-16, but with an exception: https://codeblog.jonskeet.uk/2014/11/07/when-is-a-string-not-a-string/ – M.Hassan Sep 10 '16 at 16:43
  • Please see the marked duplicate for a discussion that addresses your specific question. I.e. there is a specific way and reason for how strings are stored in .NET programs. The marked duplicate includes some details that offer alternatives if the language-required mechanism doesn't work for you. If after all that, you still are unable to achieve your goals, post a new question in which you include a good [mcve], explain _precisely_ what broader goal it is you're trying to solve, and specifically why you're having trouble solving it. – Peter Duniho Sep 10 '16 at 17:45
  • One "what would be faster" - all things considered I'd expect Unicode representation to be faster on Windows as all actual Windows methods work with Unicode strings and ASCII text will have to be converted to Unicode at some level during a call. I would definitely not bet for ASCII version to be unconditionally faster (unless you are using 8-bit CPU like Intel 8080). Create some decent synthetic test representing your code and measure - could be interesting question resulting from such test... – Alexei Levenkov Sep 10 '16 at 20:11
  • Answering your question, you *can* actualy write raw bytes to the console if you are really bent on it. You just need to use reflection to get a hold on the underlying output memory stream and write on it direclty with `Write(byte[] bytes, int offset, int length)`. – InBetween Sep 11 '16 at 21:58
  • `Console.OutputEncoding = Encoding.ASCII; var _out = Console.Out.GetType().GetField("_out", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(Console.Out); var stream = _out.GetType().GetProperty("BaseStream").GetValue(_out) as Stream; var bytes = (new ASCIIEncoding()).GetBytes("Hello world!\r\n"); stream.Write(bytes, 0, bytes.Length);` – InBetween Sep 11 '16 at 22:06

2 Answers2

1

As others mentioned, you can save the ASCII in a text file and even add it as a resource in the project so that it is included in the executable file. You can even go crazier and compress the ASCII and add it as a resource.

This can decrease the memory consumption, but it will also increase the CPU usage and make the execution a bit slower. Probably almost all of the methods in your program accept the .NET UTF-16 System.String, so the ASCII will have to be encoded to a System.String which uses CPU.

This is a usual (but not always) trade-off in optimization - faster and bigger or smaller and slower.

There are ways around to avoid the encoding. For example using winapi that accept ANSI strings What is the difference between the `A` and `W` functions in the Win32 API?

I am not sure if there is any encoding if you change the Console.OutputEncoding Print ASCII line art characters in C# console application

Community
  • 1
  • 1
Slai
  • 22,144
  • 5
  • 45
  • 53
0

You can store strings as byte arrays, and then convert them to char arrays or strings as needed.

Simply use the ASCIIEnconding class in the System.Text namespace. The relevant methods would be:

 char[] GetChars(byte[] bytes);
 string GetString(byte[] bytes);

Of course char and string are UTF-16 so your not really gaining much as you'll eventually pay the memory "price" your trying to avoid; be it in the executable size or in runtime memory.

Bottom line, I'm not really sure why you'd want to do this, it seems pointless specially in a managed environment; if you are at this level of resource optimization your probably better off in a completely different environment.

InBetween
  • 32,319
  • 3
  • 50
  • 90