1

I have a text file containing several languages, how to read in R use read.delim function,

Encoding("file.tsv")
#[1] "unknown"

source_data = read.delim(file, header= F, fileEncoding= "windows-1252",
               sep = "\t", quote = "")
source_D[360]
#[1] "ð¿ð¾ð¸ñðº ð½ð° ññ‚ð¾ð¼ ñð°ð¹ñ‚ðµ"

But the source_D[360] showed in Notepad is 'поиск на этом сайте'

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Fiona_Wang
  • 163
  • 1
  • 2
  • 12

2 Answers2

4

tidyverse approach:

use the option locale in read_delim. (readr functions have _ instead of . and are usually faster and smarter to read) more details here: https://r4ds.had.co.nz/data-import.html#parsing-a-vector

source_data = read_delim(file, header= F, 
                         locale = locale(encoding = "windows-1252"),
                         sep = "\t", quote = "")
Viviane
  • 61
  • 6
0
source_data = read.delim(file, header = F, sep = "\t", quote = "", stringsAsFactors = FALSE)
Encoding(source_data)= "UTF-8"

I have tried, If you run you R in windows, above code works for me. and if you run R in Unix, you could use following code

source_data = read.delim(file, header = F, fileEncoding="UTF-8", sep = "\t", quote = "", stringsAsFactors = FALSE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Fiona_Wang
  • 163
  • 1
  • 2
  • 12