1

I know in C++, you can check the length of the string, but in C, not so much.

Is it possible knowing the file size of a text file, to know how many characters are in the file?

Is it one byte per character or are other headers secretly stored whether or not I set them?

I would like to avoid performing a null check on every character as I iterate through the file for performance reasons.

Thanks.

mczarnek
  • 1,305
  • 2
  • 11
  • 24
  • 1
    Does this answer your question? [How do you determine the size of a file in C?](https://stackoverflow.com/questions/8236/how-do-you-determine-the-size-of-a-file-in-c) – Nate Eldredge Mar 04 '21 at 00:19
  • 1
    Depends how you define character? If it's any UTF encoding, nope, no way to know character count. If it's ASCII or some other one-byte-per-character encoding, size of file will mostly tell you (unless you need to collapse CRLF to LF for count purposes). That said, standard C has no solution, you're stuck using system APIs to get an idea. You wouldn't be doing a `NULL` check though; the C APIs either give you lines (in which case, okay, `NUL` checks tell you where a string ends), while the character driven APIs return `EOF`, and the block based APIs return the number of bytes read. – ShadowRanger Mar 04 '21 at 00:19
  • 2
    Even if you determine the size, you can't safely skip the check, as the file contents could be changed by some other program on the system, or there could be an I/O error. – Nate Eldredge Mar 04 '21 at 00:20
  • I'm in full control of this file, so not really worried about anyone else changing it. Theoretically possible, but unlikely. – mczarnek Mar 04 '21 at 00:20
  • 2
    Note also that on systems such as Windows that use CRLF line endings, the number of characters you can read from a text file won't equal its size in bytes. – Nate Eldredge Mar 04 '21 at 00:21
  • 3
    @NateEldredge The linked question is a bit of a trap, as reading the answers closely reveals that there's no portable method (neither for text or binary files) , besides opening the file and reading every character – M.M Mar 04 '21 at 00:26

2 Answers2

4

You can open the file and read all the characters and count them.

Besides that, there's no fully portable method to check how long a file is -- neither on disk, nor in terms of how many characters will be read. This is true for text files and binary files.

How do you determine the size of a file in C? goes over some of the pitfalls. Perhaps one of the solutions there will suit a subset of systems that you run your code on; or you might like to use a POSIX or operating system call.


As mentioned in comments; if the intent behind the question is to read characters and process them on the fly, then you still need to check for read errors even if you knew the file size, because reading can fail.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 3
    "because reading can fail" - and files can change while you're reading them :-) – paxdiablo Mar 04 '21 at 00:33
  • @paxdiablo only if using inferior operating systems that allow reading a file which is open for writing ;) – M.M Mar 04 '21 at 00:34
  • 1
    @M.M Those toy operating systems are only needed by those who need extra protection... ;-) – Andrew Henle Mar 04 '21 at 01:33
  • @M.M: I know you're joking, but curious: Are there common OSes that do this? I know Windows' `CreateFile` has a concept of `FILE_SHARE` permissions (and the standard C APIs implemented in terms of it can be stingy on sharing), but it's possible to read a file that's independently open for writing as long as both handles were opened with appropriate sharing flags. Are there common OSes where it's impossible to open a read handle to a file already opened for write & vice-versa? I can't think of any off the top of my head, but I'm blessed to only have to consider Windows & Linux most of the time. – ShadowRanger Mar 04 '21 at 18:38
0

Characters (of type char) are single byte values, as defined in the C standard (see CHAR_BIT). A NUL character is also a character, and so it, too, takes up a single byte.

Thus, if you are working with an ASCII text file, the file size will be the number of bytes and therefore equivalent to the number of characters.

If you are asking how long individual strings are inside the file, then you will indeed need to look for NUL and other extended character bytes and calculate string lengths on that basis. You might not be able to safely assume that there is only one NUL character and that it is at the end of the file, depending on how that file was made. There can also be newlines and other extended characters you would want to exclude. You have to decide on a character set and do counting from that set.

Further, if you are working with a file containing multibyte characters encoded in, say, Unicode, then this will be a different answer. You would use different functions to read a text file using a multibyte encoding.

So the answer will depend on what type of encoding your text file uses, and whether you are calculating characters or string lengths, which are two different measures.

Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
  • 2
    The second paragraph is not correct: a character read from text file may be a translation of multiple bytes on disk , a common example being Windows line endings – M.M Mar 04 '21 at 00:32
  • A Windows line ending is made up of two characters (`\r` and `\n`), and so takes up two bytes. Again, you have to carefully define what a character means. – Alex Reynolds Mar 04 '21 at 00:34
  • 1
    Maybe you could clarify the second paragraph, at the moment it seems to be saying that the number of bytes received by the program will equal the number of bytes in the file (which is not true) – M.M Mar 04 '21 at 00:37
  • @M.M Are there any other platforms on which the CRLF -> LF translation is performed, apart from windows? Can you give another example, apart from 0x1a (again, windows niceties). – vmt Mar 04 '21 at 02:34