Dealing with readLines() function in R

Question

I'm experiencing a very hard time with R lately.

I'm not an expert user but I'm trying to use R to read a plain text (.txt) file and capture each line of it. After that, I want to deal with those lines and make some breaks and changes in the text.

Here is the code I'm using:

fileName <- "C:/MyFolder/TEXT_TO_BE_PROCESSED.txt"
con <- file(fileName,open="r")
line <- readLines(con)
close(con)

It reads the text and the line breaks perfectly. But I don't understand how the created object line works.

The object line created with this code has the class: character and the length [57]. If I type line[1] it shows exactly the text of the first line. But if I type

length(line[1])

it returns me [1].

I would like to know how can I transform this string of length == 1 that contains 518 in fact into a string of length == 518.

Does anyone know what I'm doing wrong?

I don't need to necessarily use the readLines() function. I've did some research and also found the function scan(), but I ended with the same situation of a immutable string of 518 characters but length == 1.

Hope I've been clear enough about my doubt. Sorry for the bad English.

`readLines` returns "A character vector of length the number of lines read." (from `?readLines`). That's why each line is length 1. Have you tried `read.csv` or `read.table` for this? — Rich Scriven, Apr 11 '14 at 00:42
Please provide some of the data and what you expect as a result. It sounds like you just need `strsplit` — Rich Scriven, Apr 11 '14 at 01:09
Try `nchar(line[1])`, it'll give you the number of characters in the first element of list (i.e., the first line of your file). `length(list)` tells you the number of lines retrieved from the file; by giving it `length(list[1])`, you're asking it the number of elements in a slice of list, a slice that happens to have a single element in it (which may be a string of length 518 or whatever). — r2evans, Apr 11 '14 at 04:39
@r2evans The `nchar(line[1])` returns me the number os characters on the string. But I wannna know how to access those characters individually. The `strsplit` function does not satisfy my needs. The best way to describe what I wanna do is to say that I want to read every line of `line` (i.e.: `line[1]`, `line[2]`, ... , `line[n]`) character by character (blank or not) and make some rearrangements. — user3521631, Apr 11 '14 at 12:25
Without a better idea of what exactly you want to break a string into, my guidance is merely `?substr` and `?regexp`. — r2evans, Apr 11 '14 at 22:51

score 5 · Answer 1 · answered Apr 11 '14 at 01:35

5

You can firstly condense that code into a single line, the other 3 lines just make objects that you don't need.

line <- readLines("C:/MyFolder/TEXT_TO_BE_PROCESSED.txt")

The if you want to know how many space separated words per line

words <- sapply(line,function(x) length(unlist(strsplit(x,split=" "))))

If you leave out the length argument in the above you get a list of character vectors of the words from each line.

answered Apr 11 '14 at 01:35

JeremyS

3,497
1
17
19

I've tried this solutions. Leaving out the length argument it returns me a variable "words" that is a list of 57. If I type words[1]. It returns me the whole first line splited word by word. But I can't access a specific words like I want example.: words[1][2]. – user3521631 Apr 11 '14 at 12:20
1

then you need to look up the difference between `[` and `[[`. To get the first word of the first list entry you want `words[[1]][1]` – JeremyS Apr 17 '14 at 03:10

Rich Scriven · Accepted Answer · 2014-04-11T03:51:24.513

5

Suppose txt is the text from line 1 of your data that you read in with readLines.
Then if you want to split it into separate strings, each of which is a word, then you can use strsplit, splitting at the space between each word.

> txt <- paste0(letters[1:10], LETTERS[1:10], collapse = " ")
> txt
## [1] "aA bB cC dD eE fF gG hH iI jJ"   ## character vector of length 1
> length(txt)
[1] 1
> newTxt <- unlist(strsplit(txt, split = "\\s"))  ## split the string at the spaces
> newTxt
## [1] "aA" "bB" "cC" "dD" "eE" "fF" "gG" "hH" "iI" "jJ"
## now the text is a character vector of length 10  
> length(newTxt)
[1] 10

edited Apr 11 '14 at 03:51

answered Apr 11 '14 at 03:41

Rich Scriven

97,041
11
181
245

Thanks, but that is not exactly what I need. I dont want do split the vector in words. For me the blank spaces are really important and i want each blank space to count as a character too. The final product I'm looking for, in your example, would be a string of 29 characters. – user3521631 Apr 11 '14 at 12:14
1

Okay, then instead of `split = "\\s"`, use `split = ""` – Rich Scriven Apr 11 '14 at 19:00
The solution suggested by @Richard Scriven solved my problem. I'm really grateful. Changing the split argument was what I needed to get it done. – user3521631 Apr 13 '14 at 21:12

score 1 · Answer 3 · edited Jun 04 '16 at 16:31

1

How about:

con <- file(fileName, open='r')
text <- readLines(con)[[1]]

to get the text of the first line of the file.

edited Jun 04 '16 at 16:31

Tunaki

132,869
46
340
423

answered Jun 04 '16 at 16:11

Thys Potgieter

151
1
4

Dealing with readLines() function in R

3 Answers3

Linked