Data importing Delimiter issue in R

Question

I am trying to import a text file into R, and put it into a data frame, along with other data.

My delimiter is "|" and a sample of my data is here :

|Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light load and had 3 seats to myself. A very enthusiastic and friendly crew as usual on this transpacific route that I take several times a year. Arrived 20 min ahead of schedule. The expected high level of service from our flag carrier, Air Canada. Altitude Elite member. |We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toronto our flight was excellent. Due to the rush in Toronto one of our carry ones was placed to go in the cargo hold. When we arrived in Winnipeg it stayed in Toronto, they were most helpful and kind at the Winnipeg airport, and we received 3 phone calls the following day in regards to the misplaced bag and it was delivered to our home. We are very thankful and more than appreciative of the service we received what a great end to a wonderful holiday. |Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage whatsoever, and not even any room under the seats. Ridiculous. Crew were poor, not friendly. One older male member of staff was quite attitudinal, acting as though he was doing everyone a huge favour by serving them. A reasonable dinner but breakfast was a measly piece of banana loaf. That's it! The worst airline breakfast I have had. enter image description here

As you can see, there are many "|" , but as this screenshot below shows, when I imported the data in R, it only separated it once, instead of about 152 times.

How do I get each individual piece of text in a different column inside the data frame? I would like a data frame of length 152, not 2.

EDIT: The code lines are:

  myData <- read.table("C:/Users/Norbert/Desktop/research/Important files/Airline Reviews/Reviews/air_can_Review.txt", sep="|",quote=NULL, comment='',fill = TRUE, header=FALSE)

length(myData)
[1] 2
class(myData)
[1] "data.frame"
str(myData)
'data.frame':   1244 obs. of  2 variables:
 $ V1: Factor w/ 1093 levels "","'delayed' on departure (I reference flights between March 2014 and January 2015 in this regard: Denver, SFO,",..: 210 367    698 853 1 344 483 87 757 52 ...
 $ V2: Factor w/ 154 levels ""," hotel","5/9/2014, LHR to Vancouver, AC855. 23/9/2014, Vancouver to LHR, AC854. For Economy the leg room was OK compared to",..: 1 1 1 1 78 1 1 1 1 1 ...

 myDataFrame <- data.frame(text = myData, otherVar2 = 1, otherVar2 = "blue", stringsAsFactors = FALSE)
 str(myDataFrame)
 'data.frame':   531 obs. of  3 variables:
  $ text       : chr  "BRU-YUL, May 26th, A330-300. Departed on-time, landed 30 minutes late due to strong winds, nice flight, food" "excellent, cabin-crew smiling and attentive except for one old lady throwing meal trays like boomerangs. Seat-" "pitch was very generous, comfortable seat,  IFE a bit outdated but selection was Okay. Air Canadas problem is\nthat the new pro"| __truncated__ "" ...
$ otherVar2  : num  1 1 1 1 1 1 1 1 1 1 ...
$ otherVar2.1: chr  "blue" "blue" "blue" "blue" ...

length(myDataFrame)
[1] 3

Take a look [here](http://stackoverflow.com/questions/24679042/problems-with-reading-a-txt-file-eof-within-quoted-string). You may need to add two more arguments in read.table(): `quote=NULL, comment=''` — Parfait, Jun 01 '15 at 03:42
@Parfait It worked, but just so the warning message disappeared. The length of the data frame is still 2, when it should be 152 — Uther Pendragon, Jun 01 '15 at 03:49
@Frank, those are the delimiters and there are at least 3-4 ...and the sample text is just a small part of the file. — Uther Pendragon, Jun 01 '15 at 05:03
Odd. I used your sample data and all worked with four columns for 3 pipes ("|"). Try your own sample here. Other options: at the very end of your text file, add a return carriage; maybe try breaking up larger file; if needed, dropbox link the file. — Parfait, Jun 02 '15 at 01:34

Ken Benoit · Answer 1 · 2015-06-01T19:14:15.347

A better way to read in the text is using scan(), and then put it into a data frame with your other variables (here I just made some up). Note that I took your text above, and pasted it into a file called sample.txt, after removing the starting "|".

myData <- scan("sample.txt", what = "character", sep = "|")
myDataFrame <- data.frame(text = myData, otherVar2 = 1, otherVar2 = "blue",
                          stringsAsFactors = FALSE)
str(myDataFrame)
## 'data.frame':    3 obs. of  3 variables:
##  $ text       : chr  "Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light loa"| __truncated__ "We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toront"| __truncated__ "Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage "| __truncated__
##  $ otherVar2  : num  1 1 1
##  $ otherVar2.1: Factor w/ 1 level "blue": 1 1 1

The otherVar1, otherVar2 are just placeholders for your own variables, as you said you wanted a data.frame with other variables. I chose an integer variable and a text variable, and by specifying a single value, it gets recycled for all observations in the dataset (in the example, 3).

I realize that your question asks how to get each text in a different column, but that is not a good way to use a data.frame, since data.frames are designed to hold variables in columns. (With one text per column, you cannot add other variables.)

If you really want to do that, you have to coerce the data after transposing it, as follows:

myDataFrame <- as.data.frame(t(data.frame(text = myData, stringsAsFactors = FALSE)), stringsAsFactors = FALSE)
str(myDataFrame)
## 'data.frame':    1 obs. of  3 variables:
##  $ V1: chr "Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light loa"| __truncated__
##  $ V2: chr "We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toront"| __truncated__
##  $ V3: chr "Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage "| __truncated__
length(myDataFrame)
## [1] 3

"Measly banana loaf"? Definitely economy class.

What does the otherVar2 stand for in the line that you coded ? What should it represent ? @Ken Benoit — Uther Pendragon, Jun 01 '15 at 18:15
I think you misunderstood what I actually want to accomplish. I am trying to put each review in a different column, but your code puts all the text in 1 column... I know how to do that too, but I want to split the text at the delimiter and put the next review in a new column... I edited the code in the question, with the outputs. @Ken Benoit — Uther Pendragon, Jun 01 '15 at 18:21
No, I understood, was trying to suggest gently that you should not use a data.frame that way. See edits. — Ken Benoit, Jun 01 '15 at 19:03

Data importing Delimiter issue in R

1 Answers1