2

Someone kindly dumped the data out of a number of tables in SQL*Plus.

Is there a nice awk or similar script to turn it into CSV or something more easily parsed to load into another system. Sadly getting it re-run is not an option. They used

SQL> set pages 10000 lines 10000

followed by SELECT * from the table

There are the column_names, ---- 's and then data lines. It looks like the structure is spaces or tabs between column names and --- ---- with the number of --- probably being field length. The following is the columns,---'s and first 2 lines from one of the tables.

CM D ORDR_REF           LINE_NUM SUPP     BYR       LINE_REVN    TXT_NUM L L T G ACCPT_US A PERF ITEM             MANUF    PART_NO      EC_ CMDTY          CLSFCTN    RCPT_CNT DESCR                    ST IN STORE    EAN       QUOM QTY_ON_ORDR QTY_OUTSTG   QTY_ADVD QTY_ADVD_OUTSTG   QTY_RECV QTY_REJECT     QTY_CR QTY_INVCE_OUTSTG  QTY_INVCD QTY_INVCE_HELD QTY_CR_OUTSTG  QTY_CRDTD QTY_CR_HELD DLVRY_SI DATE_DUE  DATE_ACK  DATE_XPCT DATE_XPED XPED_USR XP LEASE            CMMT_DATE A A   MIN_AUTH   ACT_AUTH CURR_AUTH_SEQ_NUM TAX TAX_DATE  HA PUOM    DSCNT_1    DSCNT_2    DSCNT_3  ENTRD_PRC PRC    MIN_PRC P  ENTRD_VAL MIN_ENTRD_VAL  UNIT_COST VAL_ON_ORDR   VAL_RECV VAL_OUTSTG  VAL_ACCRU VAL_INVCE_OUTSTG  VAL_INVCD VAL_INVCE_HELD VAL_CR_OUTSTG  VAL_CRDTD VAL_CR_HELD VAL_REJECT VAL_CR    VAL_TAX MIN_ORDR_VAL MIN_VAL_TAX L S CNTRCT_REF       CNTRCT_LINE_NUM C GL_TRA AIRCRFT_RE AIRL FLGHT_    LEG_NUM   SRVC_QTY   RATE_PRC   CHRG_VAL UPDT_DATE UPDT_TIME USR_DATA                                                                                                      L VAT_NON_REC_VALUE VAT_REC_VALUE PEV_LINE_COST A
-- - -------------------- ---------- ------------ -------- ---------- ---------- - - - - -------- - ---- -------------------- ------------ -------------------- --- ---------------------- ---------- -------- ---------------------------------------- -- -- -------- ------------- ---- ----------- ---------- ---------- --------------- ---------- ---------- ---------- ---------------- ---------- -------------- ------------- ---------- ----------- -------- --------- --------- --------- --------- -------- -- -------------------- --------- - - ---------- ---------- ----------------- --- --------- -- ---- ---------- ---------- ---------- ---------- --- ---------- - ---------- ------------- ---------- ----------- ---------- ---------- ---------- ---------------- ---------- -------------- ------------- ---------- ----------- ---------- ---------- ---------- ------------ ----------- - - -------------------- --------------- - ------ ---------- ---- ------ ---------- ---------- ---------- ---------- --------- --------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - ----------------- ------------- ------------- -
AR O PO415966              1 040960   LFOSTER       0          0 2 2          Y                                                        Stirrers and cleaning tabs - ivan                            0          0      0       0      0      0      0            0          0          0         0          0       0 CIVIC    01-APR-20 01-JAN-00 01-APR-20 01-JAN-00                      31-MAR-20 0 0          0      0         0 01  01-JAN-00 ER           0          0      0      0          0 1      75.51         0      75.51   75.51      75.51      0          0        0      75.51          0     0      0           0      0      0       15.1        0       0   0                    0   022704                     0      0      0          0 03-APR-20 01-JAN-00                                                                                                       2         0          15.1     75.51
AR O PO415967              1 015552   LFOSTER       0          0 2 2          Y                                                        extras to PO414840 - Sam                                 0          0      0       0      0      0      0            0          0          0         0          0       0 CIVIC    01-APR-20 01-JAN-00 01-APR-20 01-JAN-00                      31-MAR-20 0 0          0      0         0 01  01-JAN-00 ER           0          0      0      0          0 1     60         0     60      60     60      0          0        0     60          0     0      0           0      0      0         12        0       0   0                    0   022705                     0      0      0          0 01-APR-20 01-JAN-00                                                                                                       2         0        12        60
Ronald Duncan
  • 131
  • 1
  • 5
  • Just to be clear, I know about the SQLPlus function to dump out to CSV (https://stackoverflow.com/questions/21107279/mysql-query-to-csv), and it would have been nice if this had been used. It is a 3rd party system where we needed to go through 2 different organisations to get the data, so very difficult to get re-run. – Ronald Duncan Aug 05 '22 at 10:53

2 Answers2

0

You may not need to go externally , but use the csv options natively in sql itself.

Something like this? MySQL Query to CSV

VipinKG
  • 11
  • 3
  • Thanks, I do not have access to the system so getting re-run with set markup csv on spool myfile.csv is not possible They went in with putty, and just recorded the putty session and sent it over. It took over a month to get them to dump the tables this way. Hence the parsing question. – Ronald Duncan Aug 05 '22 at 10:50
  • Hi Ronaload, While Your Data heading Seems right for CSV, the data inside doesn't seems a good fit. Cz there are space with in the data values and there are no text separators visible. Still can attempt to try with TAB Separator . You may want to try `awk -F'\t' '{print NF}'` and see the number of fields (NF) returning is same for all lines . if thats good then it may be good for the AWK based processing. – VipinKG Aug 05 '22 at 17:54
0

Figuring out the format, was a challenge since it is a mix of tabs and spaces, however it aligned with standard UNIX tabs so expand converted the tabs into spaces correctly aligned, and then the awk script did the rest.

The column names were white space delimited and then using the --- lines gave the column width to use FIELDWIDTHS for the data.

A couple of bash 1 lines to expand and then process.

    for f in *.txt;do expand "$f" > "${f%.txt}.fix"; done

Then call awk to convert into delimited format

    for f in *.fix;do awk -f parse.awk "$f" > "${f%.fix}.del"; done

The awk script (parse.awk) uses a couple of tricks $1=$1 read everything in using default input FS for line 1 and FIELDWIDTHS after line 2. The next print outputs using the output file separator (OFS) which is set to ¬ in BEGIN, you can use what you like. It was not in the data so no need to escape , etc.

NR==1 prints out the columns

NR==2 gets the field lengths by measuring the --- --- -----

NR>2 Processes the rest of the file to the end using fixed width set in 2


    BEGIN {
        OFS = "¬"
    }
    NR==1 {
        $1=$1
        print
    }
    NR==2 {
        fw=""
        for(i=1;i<=NF;i++) {
            fw = fw length($i)+1 " "
        }
        FIELDWIDTHS= fw
    }
    NR>2 {
        $1=$1
        print 
    }

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Ronald Duncan
  • 131
  • 1
  • 5
  • It is reasonably tested since I had about 100 meg of text data across 4 file formats, with between 80 and 140 columns per table. – Ronald Duncan Aug 08 '22 at 21:14