I had previously asked here about how to open ASCII files in R. Someone suggested that I could extract the character length from the master file and use read.fwf
. I went ahead and digitalized the master file into a .csv, I then proceed to attempt to read my .RAW
file but I am getting an error.
This is my code:
d86<-read.csv("path//86_MasterFile.csv")
d86$char<-as.numeric(as.vector(d86$X4))
t<-read.fwf("path/f990.86.raw",widths=d86$char)
This yields the error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 91 did not have 755 elements
Such error persists is I instead redefine NAs: d86[is.na(d86)] <- 0
. This is what my character vector looks like:
> print(d86$char)
[1] 4 35 9 14 2 1 2 2 2 5 2 1 1 4 1 1 1 9 2 1 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
[39] 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
[77] 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
[115] 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
[153] 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 0 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
[191] 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 6 6 6 0 3 3 9 2 10 0 0 0 0
[229] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[267] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[305] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[343] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[381] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[419] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[457] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[495] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[533] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[571] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[609] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[647] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[685] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[723] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
And my csv file looks like:
> head(d86)
V1 V2 V3 V4 V5 V6 V7 V8 char
1 1 E001 SampleCode63Number1 1 4 N G 4
2 2 E002 NameofOrganization 5 35 A E 35
3 3 E003 EmployeeIdentificationNo. 40 9 N E,F 9
4 4 E004 DocumentLocatorNumber 49 14 N F 14
5 5 E005 SampleCode63 63 2 N G 2
6 6 E006 RejectCode65 65 1 N G 1
Additional info:
The data is available here. Indeed, the problem seems to be line 91, but there is nothing particularly different about it in the source, nor data. See:
d86[91,]
V1 V2 V3 V4 V5 V6 V7 V8 char
91 91 E091 Supplies-ColumnC940 940 12 12
This is how the .RAW
file looks like:
HEIGHTS COMMUNITY CONGRESS 237242552174900020040583198705OH4411803NN NNN23724255211N 59101 0 11250 70351 0 0 3616 0 0 0 0 0 0 0 0 0 0 0 0 147269 0 147269 0 0 0 0 221236 151518 34814 18916 0 205248 15988 22190 0 38178 0 0 0 0 0 0 0 117398 83971 23924 9503 1524 983 348 193 0 0 0 0 6325 4170 1574 581 0 1000 650 350 0 0 0 0 0 37683 32703 940 4040 3951 2309 1576 66 5980 4071 540 1369 5382 3146 2236 0 2529 1745 605 179 3401 2528 340 533 1729 1593 29 107 9215 8195 281 739 0 0 0 0 2315 1158 877 280 6816 4296 1194 1326 205248 151518 34814 18916 0 0 0 0 744 0 0 0 0 0 0 0 0 0 43108 56969 2071 0 0 3248 86982 63032 3854 0 0 0 0 21000 64792 24854 22190 38178 63032 0 111631 362894 0 0 49930 185705 3739 16728 0 0 0 0 0 0 0 0 165300 565327 115370 379622 1653C 547 30053 54.9441 06-DEC-88861788040201
COLUMBUS JEWISH HOME/AGED-HERITAGE 314417962174900023850183498706OH4320903NN NNY31441796211N 2299106 0 0 2299106 3728572 0 301315 0 0 0 0 0 13226 0 13226 200 0 200 13426 0 0 0 0 0 0 0 6342419 2953412 1262136 25534 0 4241082 2101337 11838413 -34281 13905469 0 0 0 76000 0 76000 0 2492174 1960602 531572 0 44821 16081 28740 0 87123 48395 38728 0 235735 181956 53779 0 0 30355 0 30355 0 1575 0 1575 0 234343 165594 68749 0 13115 0 13115 0 9575 0 9575 0 161423 0 161423 0 53876 22954 30922 0 0 0 0 0 4942 3913 1029 0 6149 0 6149 0 2736 0 2736 0 210362 147253 63109 0 576778 406664 144580 25534 4241082 2953412 1262136 25534 1677891 0 3728572 0 507860 4953660 892988 4984972 0 0 0 0 0 35580 0 0 0 0 3094010 4050070 14225289 18519140 546822 0 0 0 16779 4050070 2601876 4613671 11623413 13905469 18519140 0 2374909 4114739 0 0 3613146 13779702 0 0 0 0 0 0 0 0 0 0 5988055 17894441 2374909 4114739 59881C 585 2204 3.7731 31 06-DEC-88861788050201
COMMUNITY INTERFAITH, INC. 356061414174900020070283398709IN4620503NY5080NNY35606141411N 127893 0 0 127893 604477 0 23556 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 755926 739057 31793 0 0 770850 -14924 66165 209155 260396 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 213077 213077 0 0 59709 59709 0 0 498064 466271 31793 0 770850 739057 31793 0 604477 0 604477 0 167524 0 15960 0 0 0 36297 42899 116805 5939 80000 50500 0 0 2010730 288309 2543882 2692064 47112 0 0 0 2279539 105017 2477717 2431668 66165 260396 2692064 0 58038 75124 0 0 567197 3637593 11452 48496 0 0 0 0 0 0 2590 6268 639277 3767481 72080 129888 6393C 557 3556 6.3840 06-DEC-88861788030201
HARPER-GRACE HOSPITALS 237321231174900023840083598612MI4820103NN YNY23732123107N 2881438 112819 0 2994257 207954826 2395 2036913 6312598 2777760 2279162 498598 0 62652366 59537158 3115208 261070 360092 -99022 3016186 34142 32665 1477 53552506 14195646 39356860 3470983 265645093 160446101 96961567 0 0 257407668 8237425 145605070 -995921 152846574 3000000 0 0 665258 197582 467676 0 123744591 90952275 32792316 0 0 0 0 0 13635510 10294810 3340700 0 9255248 6867394 2387854 0 0 137118 0 137118 0 285626 11996 273630 0 24915176 21725932 3189244 0 3085688 77142 3008546 0 340700 0 340700 0 10231200 2414563 7816637 0 5185990 1913630 3272360 0 171132 28408 142724 0 533282 301304 231978 0 222358 109623 112735 0 8187264 0 8187264 0 14570894 4516977 10053917 0 39240633 18034465 21206168 0 257407668 160446101 96961567 0 0 0 207954826 3470983 1750129 19222592 38180683 9648547 0 0 4792358 3007118 2342637 7805578 59615768 48156649 9969252 367971 137937683 45518226 298716333 325692305 33670729 3000000 9680225 0 85620906 40873871 153111263 172845731 145605070 152846574 325692305 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0C 973 973 1.0041 06-DEC-88861788040204
MOUNTIAN CHRISTIAN ACADEMY, INC. 311072335174900020091783198706KY4164903NN NNN31107233506Y 255261 198318 0 453579 0 0 0 0 3706 0 3706 0 0 0 0 0 0 0 0 0 0 0 60687 44386 16301 0 473586 490131 0 0 0 490131 -16545 1292352 -1239409 36398 0 0 0 0 0 0 0 307346 307346 0 0 0 0 0 0 0 0 0 0 21975 21975 0 0 0 0 0 0 0 0 0 0 0 46348 46348 0 0 20563 20563 0 0 0 0 0 0 0 0 0 0 12906 12906 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6754 6754 0 0 12244 12244 0 0 61991 61991 0 0 490131 490131 0 0 0 0 0 0 755 0 0 0 0 0 0 0 0 0 0 0 0 0 326052 0 1366644 326808 223147 0 0 0 67262 0 74292 290410 1292352 36398 326808 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0C 547 30053 54.9440 06-DEC-88861788030202
INDIANA HISTORICAL SOCIETY 350876384174900023810183498709IN4620203NY NNN35087638411N 230271 0 250542 480813 32112 96162 0 2404792 0 0 0 0 0 0 0 19293732 15995785 3297947 3297947 0 84010 -84010 39052 271432 -232380 10720 6006156 1794200 572224 0 0 2366424 3639732 34752568 0 38392300 42625 0 0 0 0 0 0 999788 799973 199815 0 99809 65974 33835 0 95222 78572 16650 0 71332 57490 13842 0 0 8750 0 8750 0 4280 0 4280 0 24762 13982 10780 0 11819 0 11819 0 33158 0 33158 0 230142 157191 72951 0 29245 26788 2457 0 0 0 0 0 25624 20039 5585 0 14804 14804 0 0 0 0 0 0 47834 0 47834 0 627230 516762 110468 0 2366424 1794200 572224 0 0 0 32112 10720 153671 0 0 0 11202 0 0 106000 90000 2260 27856267 31343815 0 0 485194 6546654 35011398 38632796 240496 0 0 0 0 0 258830 240496 34752568 38392300 38632796 0 225094 653359 93540 364279 95542 371697 2282400 7472554 0 0 0 0 250542 1026314 2184 7111 2949302 9895314 2853760 9523617 29493C 585 2204 3.7716 06-DEC-88861788040201