0

I am maintaining a legacy oracle proc C/C++ code which is processing a text file and updates the DB. In the code they are preparing a SELECT statement which looks something like this

SELECT 'ERROR_ID=§' || ERROR_ID ||  '§'  || ' AND ' ...

and after executing the select statement, they are getting the data as shown below in a char array.

ERROR_ID=§ASI:10§ AND 

later they are replacing the sectional symbol(§) with single quotes as shown below

if((char)file_str.arr[k]=='§')                    
{                                                 
    strncpy((char*)&file_str.arr[k],"'",1);         
} 

Basically they are getting the primary keys from the DB (old primary key)and and comparing the primary key which are present in the text file (new keys). They are using simple strcmp to compare the these primary keys.

Now I am getting an issue. Even though old and new the primary keys are matching, if I look at the log file, these question marks are coming instead of single quotes.

ERROR_ID=?ASI:10? AND FORM_ID=?064956?  - old key
ERROR_ID='ASI:10' AND FORM_ID='064956'  - new key

I am guessing that, since they are using the sectional symbol(§) in the code which is a non ASCII char, it is failing.

Please suggest.

Update: The same binary is deployed in different environments. On some of the environments, sectional symbol(§) the is getting replaced with '?' marks, and on some it is working fine. Question: Are there any environment setting that is affecting this? If yes, what I should look for.

OS on all the environments is: SunOS 5.10

Paul Floyd
  • 5,530
  • 5
  • 29
  • 43
NJMR
  • 1,886
  • 1
  • 27
  • 46
  • 1
    Show some [MCVE] please. Read also http://utf8everywhere.org/ – Basile Starynkevitch Mar 22 '18 at 08:14
  • 7
    This is an accident waiting to happen. Pay attention to the text encoding that is used by your text editor. If it doesn't match what the compiler assumes then these non-ascii glyphs do turn into what-the-heck question marks. – Hans Passant Mar 22 '18 at 08:15
  • 2
    What does the source file look like? UTF-8, UTF-16, Windows-1252? – Martin Bonner supports Monica Mar 22 '18 at 08:58
  • 3
    Also, why the cast to `char` / `char*`. Why not `file_str.arr[k] = '\'';` (rather than `strncpy`) – Martin Bonner supports Monica Mar 22 '18 at 09:00
  • 1
    ... or `std::replace(file_str.arr, file_str.arr+length, '§', '\'');` to do the whole thing in one go. – Martin Bonner supports Monica Mar 22 '18 at 09:03
  • @HansPassant: If the text editor converts every occurance § of it to ? marks, then the code should replace ? with '. I am not getting what might be the issue. The target system is UNIX, I ported the code to Visual studio, but here I am unable to reproduce the issue. – NJMR Jul 04 '18 at 09:52
  • @HansPassant: I have to prove that, this error is becuase of the non ASCII char which is present in the source code. I am unable to prove it. – NJMR Jul 04 '18 at 09:55
  • _"The same binary is deployed in different environments."_ Does it mean (1) _the same source files were compiled to produce a unique binary which was deployed on different environments_ or (2) _the same source files were compiled on different environments to produce one binary by environment which were deployed on their respective environments_? – YSC Sep 26 '18 at 11:22
  • I'm asking because `'§'` is a [multi-character literal](https://en.cppreference.com/w/cpp/language/character_literal) (on most target), has type `int` and an implementation-defined value. – YSC Sep 26 '18 at 11:24
  • @YSC: No, the unique binary is deployed on every environment. We are compiling the code on only one environment and deploying the same binary on other environments. – NJMR Sep 26 '18 at 11:46
  • Is [Can C++ variables in cpp file defined as Special Symbols β](https://stackoverflow.com/q/52586368/2410359) useful? – chux - Reinstate Monica Oct 02 '18 at 02:11

2 Answers2

0

Your executable probably doesn’t set any locale. So your program runs with the locale set by the environment. I suggest setting the locale from inside your code for example: setlocale(LC_ALL, "C"); before anything else runs.

Markus Schumann
  • 7,636
  • 1
  • 21
  • 27
  • This is a good comment.It could also be that the machines have different locales installed (you can check with `locale -a`) – Paul Floyd Sep 27 '18 at 10:40
0

I think the approach is incorrect. I think you should

  1. Read the text file, translating to wchar_t making sure you have the right locale for the text file. Whether it is a Fixed Width Character Set (Codepages in Bytes, UCS-2), Multi Byte Character Set (UTF-8, others) you will wind up with fixed width character set in wchar_t.
  2. Construct the SQL as a wstring. Now '§' should correct regardless of input file.
  3. Execute the SQL.
SJHowe
  • 756
  • 5
  • 11