Problems reading in table with unclear line-end symbol

Question

I am currently trying to read in a .txt file.

I have researched here and found Error in reading in data set in R - however, it did not solve my problem.

The data are political contributions listed by the Federal Election Commission of the U.S. at ftp://ftp.fec.gov/FEC/2014/webk14.zip

Upon inspection of the .txt, I realized that the data is weirdly structured. Especially, the end of the any line is not separated at all from the first cell of the next line (not by a "|", not by a space).

Strangely enough, import via Excel and Access seems to work just fine. However, R import does not work.

To avoid the Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 90 did not have 27 elements error, I use the following command:

webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", sep = "|", file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))

This does not result in an error, however, the results a) have a different line count than with Excel import and b) fail to correctly separate columns (which is probably the reason for a))

I would like not to do a detour via Excel and directly import into R. Any ideas what I am doing wrong?

Just use `read.delim` instread of `read.table`. – Andrey Kolyadin Jul 26 '17 at 09:51 — Andrey Kolyadin, Jul 26 '17 at 09:51
This worked very well - thanks! – deca Jul 26 '17 at 20:47 — deca, Jul 26 '17 at 20:47

score 0 · Answer 1 · answered Jul 26 '17 at 10:03

It might be related to the symbols inside the variable names so turn of interpretation of these using comment.char="", which gives you:

webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", comment.char="",sep = "|",file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))

Problems reading in table with unclear line-end symbol

1 Answers1