I am attempting to parse a fixed-width .txt files with readr
's read_fwf
. There are ~1.5 million observations and approx. 550 of them are missing the final 25 of 60 variables. This omission leads to the inperfect parsing of the final variable that these observations do have, 'description' in the example below, and leaves the dataframe without these partially filled columns.
For example,
df_baseline <- read_fwf(file = file, fwf_widths(fwf_widths, fwf_names),
col_types = col_types, trim_ws = T) %>%
mutate_all(na_if, "")
Warning: 1148 parsing failures.
row col expected actual file
300495 description 240 chars 102 '/path/to/my/file/filename.txt'
300495 NA 59 columns 31 columns '/path/to/my/file/filename.txt'
500245 description 240 chars 56 '/path/to/my/file/filename.txt'
500245 NA 59 columns 31 columns '/path/to/my/file/filename.txt'
500333 description 240 chars 33 '/path/to/my/file/filename.txt'
See problems(...) for more details.
col_types
is a string of 60 'c'
symbols in a row so that all columns are read-in as character. fwf_widths
and fwf_names
are appropriately specifications for the proposed column widths and column titles.
I understand that by having missing values in the final column of the df, I am violating the "fixed-width" nature of the document.
Is there a way that I can 1) Get read_fwf
to retain these partially filled rows?
2) If not, how can I read in this txt file given that 99% of it can be parsed according to a normal FWF?