0

I uploaded a .txt file with "|," separators, so I used strsplit()

lob<-readLines("lob_lobbying.txt")
lob<-strsplit(lob, "|,", fixed=TRUE)

However, the output is a large list, with length 1213906. Each of these are a list of type characters, but some of them have length 16 and other 17.

I want to extract each of these rows and bind them into a data frame.

For example when i do

X <-rbind(lob[[1]],lob[[2]], lob[[3]])
df<-as.data.frame(X)

The X is exactly the type of output i want (because i then can do data.frame and it works perfect!) However, given the length of the list is 1213906 i need to automate this process.

But as soon as i try to do.call(rbind, lob) it does not work. I get

"number of columns of result is not a multiple of vector length (arg 5)"

I think is because some of these character strings have length 16 and others length 17.

Is there a loop to extract each of this strings based if they have 16 or 17 characters and then bind them?

camille
  • 16,432
  • 18
  • 38
  • 60
  • 1
    [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. Right now it's hard to do more than guess as to what your data looks like and how to proceed. – camille Oct 08 '19 at 20:00

1 Answers1

0

Instead of doing the [[1]], [[2]], .., we can use do.call after padding NA at the end when the length of the list element is less than the maximum length

lob <- lapply(lob, `length<-`, max(lengths(lob)))
df1 <- do.call(rbind.data.frame, lob)

Also, instead of using readLines and then splitting with strsplit, this can be read with read.table with the sep argument and fill = TRUE

df1 <- read.table("lob_lobbying.txt", sep="|", fill = TRUE, stringsAsFactors = FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • So df1 <- read.table("lob_lobbying.txt", sep="|", fill = TRUE, stringsAsFactors = FALSE) does not work because the number of observations are reduced significantly. The warning message EOF within quoted string comes and i think its because R is not recognizing the | properly.... – Maria D. Perez Oct 08 '19 at 21:25
  • @MariaD.Perez The EOF is a different issue. – akrun Oct 08 '19 at 21:29