8

GCC 4.8, 5.1, 6.2 and Clang 3.8.1 on Ubuntu 16.10 with -std=c11, -std=c++11, -std=c++14, and -std=c++17 all exhibit this weird behaviour when using fgetws(buf, (int) bufsize, stdin) after setlocale(LC_ALL, "any_THING.utf8");.

Example program:

#include <locale.h>
#include <wchar.h>
#include <stdlib.h>
#include <stdio.h>

int main(const int argc, const char* const * const argv) {
  (void) argc;

  setlocale(LC_ALL, argv[1]);

  const size_t len = 3;

  wchar_t *buf = (wchar_t *) malloc(sizeof (wchar_t) * len),
         *stat = fgetws(buf, (int) len, stdin);

  wprintf(L"[%ls], [%ls]\n", stat, buf);

  free(buf);

  return EXIT_SUCCESS;
}

Casting malloc is just for C++-compat.

Compile it like this: cc -std=c11 fg.c -o fg.

Run it with argv[1] = "C" and echo 10 bytes to STDIN under Valgrind and we find...

$ python3 -c 'print("5" * 10)' | \
  valgrind --leak-check=full --track-origins=yes --show-leak-kinds=all ./f C
==1775== Memcheck, a memory error detector
==1775== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1775== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==1775== Command: ./f C
==1775== 
[55], [55]
==1775== 
==1775== HEAP SUMMARY:
==1775==     in use at exit: 0 bytes in 0 blocks
==1775==   total heap usage: 5 allocs, 5 frees, 25,612 bytes allocated
==1775== 
==1775== All heap blocks were freed -- no leaks are possible
==1775== 
==1775== For counts of detected and suppressed errors, rerun with: -v
==1775== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The program works perfectly, and there are no memory errors.

If it is run with a UTF-8 locale as argv[1], then we get the right output, but a memory error at 0x18 and a fatal segmentation fault.

$ python3 -c 'print("5" * 10)' | \
  valgrind --leak-check=full --track-origins=yes --show-leak-kinds=all ./f en_US.utf8
==1934== Memcheck, a memory error detector
==1934== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1934== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==1934== Command: ./f en_US.utf8
==1934== 
[55], [55]
==1934== Invalid read of size 8
==1934==    at 0x4EAF575: _IO_wfile_sync (wfileops.c:534)
==1934==    by 0x4EB6DB1: _IO_default_setbuf (genops.c:523)
==1934==    by 0x4EB2FC8: _IO_file_setbuf@@GLIBC_2.2.5 (fileops.c:459)
==1934==    by 0x4EB79B5: _IO_unbuffer_all (genops.c:921)
==1934==    by 0x4EB79B5: _IO_cleanup (genops.c:966)
==1934==    by 0x4E73282: __run_exit_handlers (exit.c:96)
==1934==    by 0x4E73339: exit (exit.c:105)
==1934==    by 0x4E593F7: (below main) (libc-start.c:325)
==1934==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==1934== 
==1934== 
==1934== Process terminating with default action of signal 11 (SIGSEGV)
==1934==  Access not within mapped region at address 0x18
==1934==    at 0x4EAF575: _IO_wfile_sync (wfileops.c:534)
==1934==    by 0x4EB6DB1: _IO_default_setbuf (genops.c:523)
==1934==    by 0x4EB2FC8: _IO_file_setbuf@@GLIBC_2.2.5 (fileops.c:459)
==1934==    by 0x4EB79B5: _IO_unbuffer_all (genops.c:921)
==1934==    by 0x4EB79B5: _IO_cleanup (genops.c:966)
==1934==    by 0x4E73282: __run_exit_handlers (exit.c:96)
==1934==    by 0x4E73339: exit (exit.c:105)
==1934==    by 0x4E593F7: (below main) (libc-start.c:325)
==1934==  If you believe this happened as a result of a stack
==1934==  overflow in your program's main thread (unlikely but
==1934==  possible), you can try to increase the size of the
==1934==  main thread stack using the --main-stacksize= flag.
==1934==  The main thread stack size used in this run was 8388608.
==1934== 
==1934== Process terminating with default action of signal 11 (SIGSEGV)
==1934==  Access not within mapped region at address 0x18
==1934==    at 0x4EAF575: _IO_wfile_sync (wfileops.c:534)
==1934==    by 0x4EB6DB1: _IO_default_setbuf (genops.c:523)
==1934==    by 0x4EB2FC8: _IO_file_setbuf@@GLIBC_2.2.5 (fileops.c:459)
==1934==    by 0x4EB79B5: _IO_unbuffer_all (genops.c:921)
==1934==    by 0x4EB79B5: _IO_cleanup (genops.c:966)
==1934==    by 0x4FAA93B: __libc_freeres (in /lib/x86_64-linux-gnu/libc-2.24.so)
==1934==    by 0x4A276EC: _vgnU_freeres (vg_preloaded.c:77)
==1934==    by 0x1101: ???
==1934==    by 0x3805234F: ??? (mc_malloc_wrappers.c:483)
==1934==    by 0x51FA8BF: ??? (in /lib/x86_64-linux-gnu/libc-2.24.so)
==1934==  If you believe this happened as a result of a stack
==1934==  overflow in your program's main thread (unlikely but
==1934==  possible), you can try to increase the size of the
==1934==  main thread stack using the --main-stacksize= flag.
==1934==  The main thread stack size used in this run was 8388608.
==1934== 
==1934== HEAP SUMMARY:
==1934==     in use at exit: 35,007 bytes in 149 blocks
==1934==   total heap usage: 233 allocs, 84 frees, 46,936 bytes allocated
==1934== 
==1934== 11 bytes in 1 blocks are still reachable in loss record 1 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6396B: new_composite_name (setlocale.c:167)
==1934==    by 0x4E63F91: setlocale (setlocale.c:378)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 32 bytes in 1 blocks are still reachable in loss record 2 of 24
==1934==    at 0x4C2EB55: calloc (vg_replace_malloc.c:711)
==1934==    by 0x4EF288B: __wcsmbs_load_conv (wcsmbsload.c:168)
==1934==    by 0x4EF2B83: get_gconv_fcts (wcsmbsload.h:75)
==1934==    by 0x4EF2B83: __wcsmbs_clone_conv (wcsmbsload.c:223)
==1934==    by 0x4EAFC58: _IO_fwide (iofwide.c:124)
==1934==    by 0x4EAB1A4: _IO_getwline_info (iogetwline.c:58)
==1934==    by 0x4EAAC4A: fgetws (iofgetws.c:53)
==1934==    by 0x10883D: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 42 bytes in 1 blocks are still reachable in loss record 3 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BAE0: _nl_make_l10nflist (l10nflist.c:166)
==1934==    by 0x4E6BE94: _nl_make_l10nflist (l10nflist.c:295)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 50 bytes in 1 blocks are still reachable in loss record 4 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BAE0: _nl_make_l10nflist (l10nflist.c:166)
==1934==    by 0x4E6BE94: _nl_make_l10nflist (l10nflist.c:295)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 56 bytes in 1 blocks are still reachable in loss record 5 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BC70: _nl_make_l10nflist (l10nflist.c:241)
==1934==    by 0x4E6BE94: _nl_make_l10nflist (l10nflist.c:295)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 92 bytes in 2 blocks are still reachable in loss record 6 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BAE0: _nl_make_l10nflist (l10nflist.c:166)
==1934==    by 0x4E6BE94: _nl_make_l10nflist (l10nflist.c:295)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 104 bytes in 1 blocks are still reachable in loss record 7 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BC70: _nl_make_l10nflist (l10nflist.c:241)
==1934==    by 0x4E6BE94: _nl_make_l10nflist (l10nflist.c:295)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 132 bytes in 12 blocks are still reachable in loss record 8 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4EC5C49: strndup (strndup.c:43)
==1934==    by 0x4E64AB4: _nl_find_locale (findlocale.c:315)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 132 bytes in 12 blocks are still reachable in loss record 9 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4EC5BF9: strdup (strdup.c:42)
==1934==    by 0x4E63BCE: setlocale (setlocale.c:369)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 144 bytes in 2 blocks are still reachable in loss record 10 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BC70: _nl_make_l10nflist (l10nflist.c:241)
==1934==    by 0x4E6BE94: _nl_make_l10nflist (l10nflist.c:295)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 208 bytes in 1 blocks are still reachable in loss record 11 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E631C9: __gconv_lookup_cache (gconv_cache.c:372)
==1934==    by 0x4E5B34B: __gconv_find_transform (gconv_db.c:752)
==1934==    by 0x4EF296A: __wcsmbs_getfct (wcsmbsload.c:91)
==1934==    by 0x4EF296A: __wcsmbs_load_conv (wcsmbsload.c:186)
==1934==    by 0x4EF2B83: get_gconv_fcts (wcsmbsload.h:75)
==1934==    by 0x4EF2B83: __wcsmbs_clone_conv (wcsmbsload.c:223)
==1934==    by 0x4EAFC58: _IO_fwide (iofwide.c:124)
==1934==    by 0x4EAB1A4: _IO_getwline_info (iogetwline.c:58)
==1934==    by 0x4EAAC4A: fgetws (iofgetws.c:53)
==1934==    by 0x10883D: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 208 bytes in 1 blocks are still reachable in loss record 12 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E630EB: __gconv_lookup_cache (gconv_cache.c:372)
==1934==    by 0x4E5B34B: __gconv_find_transform (gconv_db.c:752)
==1934==    by 0x4EF2A0D: __wcsmbs_getfct (wcsmbsload.c:91)
==1934==    by 0x4EF2A0D: __wcsmbs_load_conv (wcsmbsload.c:189)
==1934==    by 0x4EF2B83: get_gconv_fcts (wcsmbsload.h:75)
==1934==    by 0x4EF2B83: __wcsmbs_clone_conv (wcsmbsload.c:223)
==1934==    by 0x4EAFC58: _IO_fwide (iofwide.c:124)
==1934==    by 0x4EAB1A4: _IO_getwline_info (iogetwline.c:58)
==1934==    by 0x4EAAC4A: fgetws (iofgetws.c:53)
==1934==    by 0x10883D: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 365 bytes in 12 blocks are still reachable in loss record 13 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BAE0: _nl_make_l10nflist (l10nflist.c:166)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 461 bytes in 12 blocks are still reachable in loss record 14 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BAE0: _nl_make_l10nflist (l10nflist.c:166)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 672 bytes in 12 blocks are still reachable in loss record 15 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BC70: _nl_make_l10nflist (l10nflist.c:241)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 826 bytes in 24 blocks are still reachable in loss record 16 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BAE0: _nl_make_l10nflist (l10nflist.c:166)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 1,024 bytes in 1 blocks are still reachable in loss record 17 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4EA7381: _IO_file_doallocate (filedoalloc.c:101)
==1934==    by 0x4EA890C: _IO_wfile_doallocate (wfiledoalloc.c:70)
==1934==    by 0x4EAD159: _IO_wdoallocbuf (wgenops.c:390)
==1934==    by 0x4EAF39C: _IO_wfile_overflow (wfileops.c:441)
==1934==    by 0x4EACA12: __woverflow (wgenops.c:226)
==1934==    by 0x4EACA12: _IO_wdefault_xsputn (wgenops.c:331)
==1934==    by 0x4EAF7A0: _IO_wfile_xsputn (wfileops.c:1033)
==1934==    by 0x4E925EB: vfwprintf (vfprintf.c:1320)
==1934==    by 0x4EABA98: wprintf (wprintf.c:32)
==1934==    by 0x10885D: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 1,248 bytes in 12 blocks are still reachable in loss record 18 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BC70: _nl_make_l10nflist (l10nflist.c:241)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 1,600 bytes in 1 blocks are still reachable in loss record 19 of 24
==1934==    at 0x4C2CA6F: malloc (vg_replace_malloc.c:298)
==1934==    by 0x4C2EDEF: realloc (vg_replace_malloc.c:785)
==1934==    by 0x4E6B692: extend_alias_table (localealias.c:397)
==1934==    by 0x4E6B692: read_alias_file (localealias.c:319)
==1934==    by 0x4E6B8B0: _nl_expand_alias (localealias.c:203)
==1934==    by 0x4E648D7: _nl_find_locale (findlocale.c:161)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 1,728 bytes in 24 blocks are still reachable in loss record 20 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E6BC70: _nl_make_l10nflist (l10nflist.c:241)
==1934==    by 0x4E6BDC6: _nl_make_l10nflist (l10nflist.c:285)
==1934==    by 0x4E64A05: _nl_find_locale (findlocale.c:218)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 2,048 bytes in 1 blocks are still reachable in loss record 21 of 24
==1934==    at 0x4C2ED5F: realloc (vg_replace_malloc.c:785)
==1934==    by 0x4E6B61C: read_alias_file (localealias.c:331)
==1934==    by 0x4E6B8B0: _nl_expand_alias (localealias.c:203)
==1934==    by 0x4E648D7: _nl_find_locale (findlocale.c:161)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 3,344 bytes in 12 blocks are still reachable in loss record 22 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4E64F09: _nl_intern_locale_data (loadlocale.c:95)
==1934==    by 0x4E64F09: _nl_load_locale (loadlocale.c:266)
==1934==    by 0x4E649B9: _nl_find_locale (findlocale.c:234)
==1934==    by 0x4E63B7B: setlocale (setlocale.c:340)
==1934==    by 0x108806: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 4,096 bytes in 1 blocks are still reachable in loss record 23 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4EA7381: _IO_file_doallocate (filedoalloc.c:101)
==1934==    by 0x4EA890C: _IO_wfile_doallocate (wfiledoalloc.c:70)
==1934==    by 0x4EB6875: _IO_doallocbuf (genops.c:398)
==1934==    by 0x4EAE493: _IO_wfile_underflow (wfileops.c:197)
==1934==    by 0x4EAC431: _IO_wdefault_uflow (wgenops.c:213)
==1934==    by 0x4EAB0E5: _IO_getwline_info (iogetwline.c:65)
==1934==    by 0x4EAAC4A: fgetws (iofgetws.c:53)
==1934==    by 0x10883D: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== 16,384 bytes in 1 blocks are still reachable in loss record 24 of 24
==1934==    at 0x4C2CB3F: malloc (vg_replace_malloc.c:299)
==1934==    by 0x4EA88D8: _IO_wfile_doallocate (wfiledoalloc.c:79)
==1934==    by 0x4EB6875: _IO_doallocbuf (genops.c:398)
==1934==    by 0x4EAE493: _IO_wfile_underflow (wfileops.c:197)
==1934==    by 0x4EAC431: _IO_wdefault_uflow (wgenops.c:213)
==1934==    by 0x4EAB0E5: _IO_getwline_info (iogetwline.c:65)
==1934==    by 0x4EAAC4A: fgetws (iofgetws.c:53)
==1934==    by 0x10883D: main (in /home/cat/projects/c/misc/fgetws/f)
==1934== 
==1934== LEAK SUMMARY:
==1934==    definitely lost: 0 bytes in 0 blocks
==1934==    indirectly lost: 0 bytes in 0 blocks
==1934==      possibly lost: 0 bytes in 0 blocks
==1934==    still reachable: 35,007 bytes in 149 blocks
==1934==         suppressed: 0 bytes in 0 blocks
==1934== 
==1934== For counts of detected and suppressed errors, rerun with: -v
==1934== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)

My question boils down to: is this a bug in libc6 or libstdc++6? Or does fgetws after setting a UTF-8 locale exhibit some sort of undefined behaviour (according to glibc docs or the C standard), or is my code somehow wrong?

Note that by Valgrind's stack trace it seems like it may be a bug in Valgrind, but the program segfaults when not run under Valgrind or when run with AddressSanitizer (libasan) instead.

cat
  • 3,888
  • 5
  • 32
  • 61
  • 1
    I can confirm the SIGSEGV without running Valgrind. It only happens when there are more than nine of the `5`'s in the input string (as given in the example), though. – The Vee Dec 05 '16 at 14:56
  • 1
    FWIW seems to work fine on OS X. – MK. Dec 05 '16 at 14:57
  • @TheVee Yeah. I'm trying to figure out the correlation because the magic number goes up to 50 or something when `bufsize` is 22. – cat Dec 05 '16 at 15:01
  • @MK. With GCC or Clang, and with OSX's libc or glibc? – cat Dec 05 '16 at 15:17
  • 1
    Must be a bug. `_IO_wfile_sync (wfileops.c:534)` fails to a null pointer on a line `fp->_IO_read_ptr = fp->_IO_read_base + nread;` due to `fp = NULL` while four lines prior to that it was not `NULL`. The only thing inbetween is a function call that calculates some multi-byte difference and is not even given a pointer to `fp` to overwrite. – The Vee Dec 05 '16 at 15:19
  • No repro on a couple of linux systems ([here's one](http://coliru.stacked-crooked.com/a/2cc67b157a138338)) and cygwin. Looks like a glibc bug in your specific version to me. – n. m. could be an AI Dec 05 '16 at 15:33
  • @n.m. Coilru's valgrind doesn't work but are Coilru's segfaults silent? – cat Dec 05 '16 at 15:36
  • I am now absolutely certain it is a GLIBC bug, please report it if you can. [This line](https://code.woboq.org/userspace/glibc/libio/wfileops.c.html#531) calls [this function](https://code.woboq.org/userspace/glibc/libio/iofwide.c.html#do_length) with a negative `delta` (`= max`), proceeded by allocating a "negative-length" array on the stack and overwriting values stored there. By sheer luck the only observable effect of the misbehaviour is that line 534 receives a `NULL` pointer (the other overwritten values were unimportant zeroes). – The Vee Dec 05 '16 at 15:46
  • clang, not actually sure how to tell which libc for sure, but I assume OSX's libc because why not. – MK. Dec 05 '16 at 16:13
  • Even a silent segfault would prevent the right side of `&&` from executing. – n. m. could be an AI Dec 05 '16 at 16:15
  • Does the problem occur with larger `len` like `len = 80;`? – chux - Reinstate Monica Dec 05 '16 at 16:44
  • @chux Yes, the errors get worse (a syscall segfault) as bufsize increases, above 100 and beyond – cat Dec 05 '16 at 16:47
  • 2
    If my answer is correct, then more difference between end of string (101) and read pointer (2) ⇒ more serious stack overwriting ⇒ more serious problems. – The Vee Dec 05 '16 at 16:48

1 Answers1

5

Luckily my system produces the same error and the same backtrace as yours (same files, same line numbers) so I have been able to investigate a bit.

Here's what causes the segmentation failure: after the return from main, all the internal structures are freed, one of them being stdin. Up to the point of _IO_wfile_sync the same first goes off for stdout which does not cause any problems, so it is intended to happen. The difference is that for stdin, the delta at line 508 is zero for stdout, leading to skipping of most of the function's code, but nonzero for stdin. At this point, fp->_wide_data->_IO_read_end points (understandably) to the end of the input string L"5555555555\n", while fp->_wide_data->_IO_read_ptr is at the third character (after reading two), and the difference is -9.

Now if you ask me storing a negative difference in some type called _IO_ssize_t smells, and surely enough this does cause trouble. Line 531 calls the function do_length which expects an argument max for a buffer size and receives -9 (or, presumably, 2^word_size - 9). Among the first lines in this function is the declaration

wchar_t to_buf[max];

which results in increasing the stack pointer instead of decreasing it, and data that should have been safely stored there (among them the pointer fp of _IO_wfile_sync(), as it ends up stored in a register rbx) is overwritten at the first opportunity.

After the return from the function fp is overwritten with something that does not make sense (NULL, in my case), even though it has never been exposed to it, and dereferencing it on line 534 causes a SIGSEGV, as the backtrace tells us.

I haven't read enough of the code to make an educated guess on whether line 508 should have maybe said

delta = fp->_wide_data->_IO_read_end - fp->_wide_data->_IO_read_ptr;

instead of the opposite, or if -delta should have been passed on for max, or if it is unexpected behaviour that the _ptr points before the _end, but certainly anything that results in passing a negative value to a variable length array is not OK. As both of the files referenced here are part of glibc, I think it's safe to assume that would be the right place to direct a bug report to. This goes along well with the negative confirmations from non-glibc systems.

PS. This does not happen for non-UTF locales because the call leading to do_length is only executed for variable-length encodings (it's wrapped in an if-else on line 518). If it's a 8-bit or fixed 16- or 32-bit UCS (supposedly), delta only gets multiplied by a constant. If the encoding can have varying byte lengths per character, the corresponding calculation must look inside the buffer to figure out how many characters it represents, or construct the representation in order to determine how many characters it will take.

The Vee
  • 11,420
  • 5
  • 27
  • 60
  • Nice investigative skills! If you're up for it, since you're evidently better at this than me, *you* should probably file the report with all these details. – cat Dec 05 '16 at 16:50
  • Also, any clues as to why this only happens with UTF8 locales? It doesn't happen with non-C non-UTF8 locales either, like eg `wa_BE@euro`, yet these names seem non-UTF8-specific. – cat Dec 05 '16 at 16:54
  • @cat I think that would be explained by this comment on line 519: `/* It is easy, a fixed number of input bytes are used for each wide character. */`. `delta` is still negative, but the control goes there instead to the line 530 for `wa_BE@euro`. That would happen for all encodings which use fixed length `clen` for characters, I don't know what encoding that specific locale uses but I think it would be the case there. UTF-8 needs this special treatment because character representations have varying lengths. – The Vee Dec 05 '16 at 17:09
  • (Added to the answer) – The Vee Dec 05 '16 at 17:25
  • What's your system and libc6 version? Ubuntu 16 series with GLibC 2.24? – cat Dec 06 '16 at 13:14
  • 1
    I have a Fedora 24 with glibc 2.23. (But save possible vendor patches, I don't think this will depend on the distribution.) – The Vee Dec 06 '16 at 13:19
  • 2
    It's kind of a mess because I'm not used to SourceWare BugZilla but here's the report, with a fix: https://sourceware.org/bugzilla/show_bug.cgi?id=20938 – cat Dec 08 '16 at 01:46
  • 1
    You really put a lot of work into that, @cat, thanks! I was trying to find time for it but didn't really have it so far (a poor excuse, when I had time for discussing questions on SO, I know). – The Vee Dec 08 '16 at 01:51
  • Not to worry -- in the world of free software, it's okay to be lazy :) – cat Dec 08 '16 at 03:41
  • Hey, look, another very closely related bug in glibc stdio's `wchar_t` handling, where a negative pointer subtraction results in a huge value: https://sourceware.org/bugzilla/show_bug.cgi?id=20632. I'm beginning to think I shouldn't bother using `wchar_t`... – cat Dec 08 '16 at 22:55
  • Well, it's been a year and 8 months, I think I'll write on the OSS Security mailing list about this since it's still a problem in glibc 2.27 – cat Aug 24 '18 at 06:49