0

I am using PostgreSQL 9.6.1 on my Windows 7 laptop to compile and analyze large datasets from different sources. One of my customers noticed that in the final reports I provide them, some of the individuals from her state are being lumped in with other states.

For this report, I create the final table with:

CREATE UNLOGGED TABLE LPIS_IssuanceDetail (
  ID SERIAL PRIMARY KEY,
  Zone TEXT DEFAULT NULL,
  State TEXT DEFAULT NULL,
  LastName TEXT DEFAULT NULL,
  FirstName TEXT DEFAULT NULL,
  Email TEXT DEFAULT NULL,
  UPN TEXT DEFAULT NULL,
  LincPassUsed TEXT DEFAULT NULL,
  EmployeeID TEXT DEFAULT NULL,
  EmploymentType TEXT DEFAULT NULL,
  NonEmployeeCategory TEXT DEFAULT NULL,
  EmploymentStatus TEXT DEFAULT NULL,
  ISAComplete TEXT DEFAULT NULL,
  ISACompletionDate TIMESTAMP WITHOUT TIME ZONE,
  LincPassStatus TEXT DEFAULT NULL,
  ERO TEXT DEFAULT NULL,
  Sponsored TEXT DEFAULT NULL,
  Enrolled TEXT DEFAULT NULL,
  Adjudicated TEXT DEFAULT NULL,
  ShipToSite TEXT DEFAULT NULL,
  ValidSite TEXT DEFAULT NULL,
  CardExpiration DATE,
  CertExpiration DATE,
  LastEnrollment DATE,
  EnrollmentExpiration DATE,
  NewEnrollment TEXT DEFAULT NULL,
  Sponsor TEXT DEFAULT NULL,
  ContractEnd DATE,
  ContractID TEXT DEFAULT NULL,
  ContractPOC TEXT DEFAULT NULL
);

I then populate this table with data from the master data table with:

INSERT INTO LPIS_IssuanceDetail (
  Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
  EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
  ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
  ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
  CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
)
SELECT
  Zone, StateName, MAS_LastName, MAS_FirstName, MAS_Email, MAS_UPN,
  LincPassUsed, MAS_EmployeeID, MAS_Category, MAS_OrgRelType,
  MAS_EmploymentStatus, ISAComplete, ISA_CompletionDate, MAS_IssuanceStatus,
  MAS_FedEmerResponse, Sponsored, Enrolled, Adjudicated, MAS_ShipToCityState,
  MAS_ValidShipToSite, MAS_CertExpireDate, MAS_LastEnrollmentDate, MAS_EnrollExpireDate,
  MAS_CardExpireDate, MAS_NewEnrollment, MAS_Sponsor, MAS_PeriodofPerformanceEndDate,
  MAS_ContractID, MAS_ContractPOC
FROM LPIS_MasterData
ORDER BY Zone, StateName, MAS_LastName, MAS_FirstName;

And sure enough, as I scroll down the table, I find single records interspersed out of sequence like this sample where one record from Maine is out of place:

id     | zone | state   | lastname | firstname
11849  | 3    | Georgia | Roberts  | George
11850  | 3    | Georgia | Smith    | Dan
11922  | 3    | Maine   | Edwards  | John
11851  | 3    | Georgia | Snowden  | Ed
11852  | 3    | Georgia | Williams | Casey

As a test, I dumped just the first four columns to a separate table, like so:

CREATE UNLOGGED TABLE LPIS_DetailTest (
  ID SERIAL PRIMARY KEY,
  Zone TEXT DEFAULT NULL,
  State TEXT DEFAULT NULL,
  LastName TEXT DEFAULT NULL,
  FirstName TEXT DEFAULT NULL
);

INSERT INTO LPIS_DetailTest (
  Zone, State, LastName, FirstName
)
SELECT
  Zone, State, LastName, FirstName
  FROM LPIS_IssuanceDetail
    ORDER BY Zone, State, LastName, FirstName;

And all of the rows are in the expected order:

id     | zone | state   | lastname | firstname
11849  | 3    | Georgia | Roberts  | George
11850  | 3    | Georgia | Smith    | Dan
11851  | 3    | Georgia | Snowden  | Ed
11852  | 3    | Georgia | Williams | Casey
11853  | 3    | Georgia | Spaid    | Dennis

Why would this smaller table sort correctly using the same exact ORDER BY clause as the larger table where some of the rows are out of sequence?

The database and all the tables are set up as UTF8.

I have looked at everything, and cannot figure out why the ORDER BY clause is producing such strange results. What else can I check?

Edit: Additional Information

In my script, immediately following the INSERT INTO ... SELERCT ... statement, I use COPY to dump the data to a CSV file, like so:

-- Export data to CSV files
COPY LPIS_IssuanceDetail (
    Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
    EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
    ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
    ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
    CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
)
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv'
WITH (
    FORMAT CSV,
  DELIMITER ',',
    NULL '',
    HEADER TRUE,
  QUOTE '"',
    ENCODING 'UTF8'
);

Then when I import this CSV file into a spreadsheet for final presentation, I must manually sort the data on the ID column, then remove that column.

New Question: Is there any option that I can use in the INSERT INTO statement that would strictly preserve the order of the rows to follow what is specified in the ORDER BY clause?

Michael Sheaver
  • 2,059
  • 5
  • 25
  • 38
  • 2
    "*as I scroll down the table*" - and how is the result for that "scrolling down" generated? If that select has no `order by` the order of rows is not defined. Just because you _insert_ the rows in a specific order does not mean a `select` will return them in that order. The ***only*** (really: the **only**) way to get a consistent order is to use an order by when _selecting_ the rows. The `order by` you use in the source of your insert statement is essentially useless. –  Dec 15 '16 at 15:25
  • @a_horse_with_no_name, I suspected that this was the case, and it puts me in a small quandary. Immediately following the SELECT statement, I use a COPY ... TO .... to dump the processed dataset to a CSV file, and the syntax of the COPY does not support ORDER BY. – Michael Sheaver Dec 22 '16 at 12:36
  • Does this answer your question? [SQL best practice to deal with default sort order](https://stackoverflow.com/questions/1793147/sql-best-practice-to-deal-with-default-sort-order) – philipxy Feb 25 '21 at 23:39

1 Answers1

1

If you want the data in the CSV file sorted, use copy with a select statement:

COPY (select Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
    EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
    ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
    ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
    CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
    from LPIS_IssuanceDetail 
    ORDER BY Zone, State, LastName, FirstName
)
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv'
WITH (FORMAT CSV, DELIMITER ',',  NULL '', HEADER TRUE, QUOTE '"', ENCODING 'UTF8');
  • I must say that this solution was surprisingly simple! After seeing your answer, I went back to the PostgreSQL page for the COPY statement, and sure enough, buried in the syntax, is an entry for ( query ) in there, and of course I missed it! Your help was most appreciated! – Michael Sheaver Dec 22 '16 at 20:26