I am using PostgreSQL 9.6.1 on my Windows 7 laptop to compile and analyze large datasets from different sources. One of my customers noticed that in the final reports I provide them, some of the individuals from her state are being lumped in with other states.
For this report, I create the final table with:
CREATE UNLOGGED TABLE LPIS_IssuanceDetail (
ID SERIAL PRIMARY KEY,
Zone TEXT DEFAULT NULL,
State TEXT DEFAULT NULL,
LastName TEXT DEFAULT NULL,
FirstName TEXT DEFAULT NULL,
Email TEXT DEFAULT NULL,
UPN TEXT DEFAULT NULL,
LincPassUsed TEXT DEFAULT NULL,
EmployeeID TEXT DEFAULT NULL,
EmploymentType TEXT DEFAULT NULL,
NonEmployeeCategory TEXT DEFAULT NULL,
EmploymentStatus TEXT DEFAULT NULL,
ISAComplete TEXT DEFAULT NULL,
ISACompletionDate TIMESTAMP WITHOUT TIME ZONE,
LincPassStatus TEXT DEFAULT NULL,
ERO TEXT DEFAULT NULL,
Sponsored TEXT DEFAULT NULL,
Enrolled TEXT DEFAULT NULL,
Adjudicated TEXT DEFAULT NULL,
ShipToSite TEXT DEFAULT NULL,
ValidSite TEXT DEFAULT NULL,
CardExpiration DATE,
CertExpiration DATE,
LastEnrollment DATE,
EnrollmentExpiration DATE,
NewEnrollment TEXT DEFAULT NULL,
Sponsor TEXT DEFAULT NULL,
ContractEnd DATE,
ContractID TEXT DEFAULT NULL,
ContractPOC TEXT DEFAULT NULL
);
I then populate this table with data from the master data table with:
INSERT INTO LPIS_IssuanceDetail (
Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
)
SELECT
Zone, StateName, MAS_LastName, MAS_FirstName, MAS_Email, MAS_UPN,
LincPassUsed, MAS_EmployeeID, MAS_Category, MAS_OrgRelType,
MAS_EmploymentStatus, ISAComplete, ISA_CompletionDate, MAS_IssuanceStatus,
MAS_FedEmerResponse, Sponsored, Enrolled, Adjudicated, MAS_ShipToCityState,
MAS_ValidShipToSite, MAS_CertExpireDate, MAS_LastEnrollmentDate, MAS_EnrollExpireDate,
MAS_CardExpireDate, MAS_NewEnrollment, MAS_Sponsor, MAS_PeriodofPerformanceEndDate,
MAS_ContractID, MAS_ContractPOC
FROM LPIS_MasterData
ORDER BY Zone, StateName, MAS_LastName, MAS_FirstName;
And sure enough, as I scroll down the table, I find single records interspersed out of sequence like this sample where one record from Maine is out of place:
id | zone | state | lastname | firstname
11849 | 3 | Georgia | Roberts | George
11850 | 3 | Georgia | Smith | Dan
11922 | 3 | Maine | Edwards | John
11851 | 3 | Georgia | Snowden | Ed
11852 | 3 | Georgia | Williams | Casey
As a test, I dumped just the first four columns to a separate table, like so:
CREATE UNLOGGED TABLE LPIS_DetailTest (
ID SERIAL PRIMARY KEY,
Zone TEXT DEFAULT NULL,
State TEXT DEFAULT NULL,
LastName TEXT DEFAULT NULL,
FirstName TEXT DEFAULT NULL
);
INSERT INTO LPIS_DetailTest (
Zone, State, LastName, FirstName
)
SELECT
Zone, State, LastName, FirstName
FROM LPIS_IssuanceDetail
ORDER BY Zone, State, LastName, FirstName;
And all of the rows are in the expected order:
id | zone | state | lastname | firstname
11849 | 3 | Georgia | Roberts | George
11850 | 3 | Georgia | Smith | Dan
11851 | 3 | Georgia | Snowden | Ed
11852 | 3 | Georgia | Williams | Casey
11853 | 3 | Georgia | Spaid | Dennis
Why would this smaller table sort correctly using the same exact ORDER BY
clause as the larger table where some of the rows are out of sequence?
The database and all the tables are set up as UTF8.
I have looked at everything, and cannot figure out why the ORDER BY
clause is producing such strange results. What else can I check?
Edit: Additional Information
In my script, immediately following the INSERT INTO ... SELERCT ...
statement, I use COPY to dump the data to a CSV file, like so:
-- Export data to CSV files
COPY LPIS_IssuanceDetail (
Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
)
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv'
WITH (
FORMAT CSV,
DELIMITER ',',
NULL '',
HEADER TRUE,
QUOTE '"',
ENCODING 'UTF8'
);
Then when I import this CSV file into a spreadsheet for final presentation, I must manually sort the data on the ID column, then remove that column.
New Question: Is there any option that I can use in the INSERT INTO statement that would strictly preserve the order of the rows to follow what is specified in the ORDER BY clause?