2

Good day. Below is a Powerbuilder script that check the file encoding and get the data from the file. Once the data is assign to a string variable(ls_encoding) it will be passed to the object function, of_TabDelimited() (script of this function was also shown below) to replace the pipe delimited to tab delimited.

While inside the object function of_TabDelimited(), the application got hang-up and I can't figure out the caused of the PB hang-up. The data that was passed is about 2,800 rows. But the function of_TabDelimited() is working fine if the data is not that in bigger rows(e.g. 100 rows). I can't find any limitation of processing in of_TabDelimited() because it only performs a loop and replace function while inside the loop.

Anybody can help me on how to find and correct the error? Appreciate any help.

//Get the data from the file
ll_FileNum = FileOpen(ls_sourcepath, StreamMode!, Read!, LockWrite!, Replace!)
ll_FileLength = FileLength(ls_sourcepath)
eRet = FileEncoding(ls_sourcepath)
IF NOT ISNULL(ll_FileLength) OR ll_FileLength > 0 THEN  
    IF eRet = EncodingANSI! THEN 
        ll_bytes = FileReadEx(ll_FileNum, lbl_data)     
        ls_Encoding = String(lbl_data, EncodingUTF8!)   
        FileClose(ll_FileNum)
    END IF
ELSEIF ISNULL(ll_FileLength) OR ll_FileLength = 0 THEN 
    lb_Return = FALSE
END IF

Integer li_start, li_end
//Convert the data to TabDelimited
ls_Return = of_TabDelimited(ls_Encoding) //Application hang-up on this function.
IF NOT ISNULL(ls_Return) OR LEN(ls_Return) > 0 THEN 
    //Parse To Table
END IF


Function        : of_TabDelimited
Return Type     : String
Argument Type   : as_encoding

Long ll_start=1
String ls_old, ls_new
ls_old = "|"
ls_new = "~t"

// Find the first occurrence of old_str.
ll_start = POS(as_encoding, ls_old, ll_start)

// Only enter the loop if you find old_str.
DO WHILE ll_Start > 0
     // Replace old_str with new_str.
     as_encoding = Replace(as_encoding, ll_start, Len(ls_old), ls_new)

    // Find the next occurrence of old_str.
    ll_start = POS(as_encoding, ls_old, ll_start + Len(ls_new))
LOOP    

RETURN as_encoding

Below is a new function that I've added in the application and tested. And it seems this script is working fine and can process large number of rows. Basically this script was taken from the PFC's function of_GlobalReplace(), from the object n_cst_string. Even the script in of_TabDelimited() was taken from of_GlobalReplace() but the difference is the computation of len() in the variable ls_old and ls_new.

String ls_Source, ls_Old, ls_New
Long ll_Start, ll_Oldlen, ll_Newlen
ls_Old = "|"
ls_New = "~t"

//Script taken from n_cst_string - of_GlobalReplace
//Get the string lenghts
ll_OldLen = Len(ls_Old)
ll_NewLen = Len(ls_New)

//Search for the first occurrence of as_Old
ll_Start = Pos(as_source, ls_Old)

Do While ll_Start > 0
    // replace as_Old with as_New
    as_Source = Replace(as_Source, ll_Start, ll_OldLen, ls_New)

    // find the next occurrence of as_Old
    ll_Start = Pos(as_source, ls_Old, (ll_Start + ll_NewLen))
Loop

Return as_Source
RedHat
  • 205
  • 2
  • 12
  • 24
  • How big is your file? Is there many columns? What could be the max line length? – Seki Nov 03 '15 at 09:45
  • Seki, file size is 254KB which contains 2,800 rows and the column is until column V if the data is copied in Excel. There is no fixed max line length but as of now I'm working out the file with 1,500 lines/rows. – RedHat Nov 03 '15 at 10:05
  • I thought that it could be relative to the use of `long`instead of `unsigned long` to store the positions that could turn into negative value while processing big files, but 1) `Pos()` do returns a `long` value and 2) even a signed `long` will allow the processing of a 2GB file which seem not being your case. – Seki Nov 03 '15 at 12:09
  • 1
    as a side comment, you should compute the `len()` of both strings `ls_old` and `ls_new` outside of the loop as I am pretty sure that PB is not deducing it is a constant value. It should improve a bit the performance. – Seki Nov 03 '15 at 12:12
  • 1
    The replace functionality matches pretty much what's in PFC's of_GlobalReplace(), so I suspect you're solid. Any chance this is a performance issue? To test, in the loop I'd do a FileOpen / FileWrite / FileClose with the timestamp and the value of ll_start. If it's moving, you'll see it. If it's stopped, you'll see where in the file it's stuck. (BTW, what version PB?) – Terry Nov 03 '15 at 16:31
  • @Seki, Yes I have re-write the script on a new function and added the len() to both strings ls_old and ls_new. And the application can process large numbers of rows. – RedHat Nov 04 '15 at 02:18
  • @Terry, Yes you are correct, both script in of_TabDelimited() and of_ReplaceData() was taken from the PFC's of_GlobalReplace(). It is now working. The PB version is 10.2. – RedHat Nov 04 '15 at 02:20
  • @Terry: declare the local external `Subroutine OutputDebugString (String lpszOutputString) library "kernel32.dll" alias for "OutputDebugStringW"`, use it in the loop and spy results with [DebugView](https://technet.microsoft.com/en-us/sysinternals/bb896647.aspx) it is much more efficient than writing in a file. PB file capabilities are terrible and it is much worse in a loop. – Seki Nov 04 '15 at 09:52
  • @Seki: will explore on that. – RedHat Nov 04 '15 at 10:25

1 Answers1

0

I went to review the PFC's of_GlobalReplace() function and copied the script from there but I have removed the checking of case sensitivity. I went to test below function and it is working fine in processing large amount of data from the file.

String ls_Source, ls_Old, ls_New
Long ll_Start, ll_Oldlen, ll_Newlen
ls_Old = "|"
ls_New = "~t"

//Script taken from n_cst_string - of_GlobalReplace
//Get the string lenghts
ll_OldLen = Len(ls_Old)
ll_NewLen = Len(ls_New)

//Search for the first occurrence of as_Old
ll_Start = Pos(as_source, ls_Old)

Do While ll_Start > 0
    // replace as_Old with as_New
    as_Source = Replace(as_Source, ll_Start, ll_OldLen, ls_New)

    // find the next occurrence of as_Old
    ll_Start = Pos(as_source, ls_Old, (ll_Start + ll_NewLen))
Loop

Return as_Source
RedHat
  • 205
  • 2
  • 12
  • 24