1

I'm having problem reading text file into R. The text file has 8 columns and a header which looks exactly like this:

ID          1990    1991    1992    1993    1994    1995    1996
A           36.88   45.48   52.46   111.31  138.45  121.09  122.62
B           19.11   27.97   37.14   47.68   60.78   35.84   38.64
C           56.21   74.94   92.3    118.62  138.13  104.65  113.98
D           30.48   51.54   61.57   99.87   80.9    84.97   99.34

When I do the following, I get the error

> extra<- read.table("extrab.txt", header=T, sep="\t")
Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<ff><fe>I'

So I tried adding fileEnconding

> extra<- read.table("extrab.txt", header=T, sep="\t", fileEncoding="UCS-2LE")

This worked, but I ended up with a dataframe with one variable where ID to 1996 was treated as one column. Would there be a way to solve this?

I'm adding few more lines on this problem, because I found a different error when I tried to import the file through R error text file import

halo09876
  • 2,725
  • 12
  • 51
  • 71

2 Answers2

2

As per this SO question, the error you're getting seems to be related to file encoding.

Option 1:

You likely just need to figure out the right file encoding to use.

Example:

extra<- read.table("extrab.txt", header=T, sep="\t", fileEncoding="latin1")

Option 2:

You can try opening the file in Notepad/whatever text editor and then "save as" using a a common format like ANSI, Unicode or UTF-8.

In Windows Notepad, notice there's an "Encoding" dropdown when you SaveAs. ANSI should work fine.

Community
  • 1
  • 1
Tommy O'Dell
  • 7,019
  • 13
  • 56
  • 69
  • It's still bugging me. I saved as UTF-8 and did the following: extra<- read.table("extrab.txt", header=T, sep="\t", fileEncoding="UTF-8") but this also yielded a data frame which treated all the columns as one column. – halo09876 Dec 18 '13 at 06:54
  • Just a thought... perhaps try importing to Excel via the Import Text Wizard, and then save to .csv – Tommy O'Dell Dec 18 '13 at 08:05
  • I did that and it worked! But UTF-8 format did not work in excel so I had to save the text file into Latin 1 format first. – halo09876 Dec 19 '13 at 02:57
2

Now that you aren't getting the file encoding problem, it might just be that your separator is actually not a tab. Try:

extra<- read.table("extrab.txt", header=T, fileEncoding="UCS-2LE")

This will separate on any whitespace

josliber
  • 43,891
  • 12
  • 98
  • 133
  • Tried, and it yielded a different error:Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 8 elements – halo09876 Dec 18 '13 at 06:49
  • What is the output of `count.fields("extrab.txt", fileEncoding="UCS-2LE")`? – josliber Dec 18 '13 at 07:10
  • I get Warning message: In read.table("extrab.txt", header = TRUE, fileEncoding = "UCS-2LE") : incomplete final line found by readTableHeader on 'extrab.txt' – halo09876 Dec 19 '13 at 02:04