I have been provided a data file in a format I have never seen. The data do not appear to be in columns, but rather in one long row. I can open the file in Notepad
and see the data. So, the data do not appear to be encrypted.
When I open the data file in Notepad
the row of data wraps back to the to left side of the Notepad
window when I guess the data reach the maximum number of characters that Notepad
allowed in a single row, and then the data continue in a new row.
There might be 10,000 rows of data when I open the file in Notepad
. The data in one of these rows are not aligned with the data in the row above it or below it.
Here are some example data:
40001 1 5 GGGG 2998 HHHH SU111111 95 1.0 F1 4 1304 3 0 0
40001 1 5 GGGG 2998 HHHH SU111111 95 1.0 F1 4 0205 0 3 0
40001 1 5 GGGG 2998 HURG SU111111 95 1.0 F1 4 0805 0 2 0
40001 1 5 GGGG 2998 HHHH SU111111 95 1.0 F1 4 1205 0 2 0
40001 1 5 GGGG 2998 HHHH SU111111 95 1.0 F1 4 1505 0 0
40002 2 8 GGGG 2998 PPPP SK777777 -999 1.0 F3 4 2003 0 0
40002 2 8 GGGG 2998 PPPP SK777777 -999 1.0 F3 4 2303 2 0 0
40002 2 8 GGGG 2998 PPPP SK777777 -999 1.0 F3 4 2703 3 0 0
40002 2 8 GGGG 2998 PPPP SK777777 -999
Notice that when I paste the example data here, representing one row in Notepad
, the columns are 'magically' aligned.
I have found that I can open the data file in Excel
and the data are also aligned. I do need to manually assign column boundaries in Excel
however. And Excel
does not allow me to assign a column boundary beyond more-or-less Character Space 123.
Below is SAS
code to read the data file, although this SAS
code does not work correctly. Rather I guess this SAS
code skips some of the data rows. Notice that the variable TT
covers character spaces 125-207, but that there are only 120 characters in most rows. There are more than 120 characters in some rows. This difference in the number of characters among rows I suspect is the reason SAS cannot read this data file correctly.
option linesize = 210 ;
option pagesize = 30 ;
FILENAME myinput 'C:/Users/markm/simple SAS programs/mydata.new' ;
DATA mydata ;
INFILE myinput ;
INPUT
AA 2-9
BB 12-17
CC 18-22
DD $ 24-27
EE 30-33
FF $ 35-38
GG $ 40-47
HH 53-56
II 59-64
JJ $ 66-68
KK $ 70-71
LL 72-78
MM 79-85
NN $ 87-90
OO 91-95
PP 97-104
QQ 105-110
RR 112-120
SS $ 122-123
TT $ 125-207 ;
If I move the cursor to the right one character at a time over the first row of data using the right-arrow key I have to press the right-arrow key twice to move beyond character space 120 in Notepad
.
All of this is telling me there are hidden characters in the data file used to identify the end of a line of data.
I opened the data file in Vim
hoping to see these hidden characters, but did not see anything. Vim
did align the columns correctly when I opened the file. So, Vim
must be seeing these hidden end-of-line characters.
How can I see these end-of-line characters myself? I suspect there is an option in Vim
to reveal the hidden characters.
How can I determine the application that created this data file?
How can I modify the above SAS
code to read this data file correctly?