2

My system is win 10,with R 3.5.3 and Rstudio 1.1.463,locale as below:

> Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" 

My classmate gave me a UTF8 csv file sample.csv produced in linux system,this file can be produced by php script as below:

<?php
$a=
array (
      'col1' => 12,
      'col2' =>  'Y' ,
      'col3' =>  '<p style="text-align: center;">
    <strong style="text-align: center;"><span style="color: rgb(105, 105, 105); font-family: verdana, arial, sans-serif; font-size: 13px;">版权</span></strong></p>
<p>
    <span style="color: rgb(105, 105, 105); font-family: verdana, arial, sans-serif; font-size: 13px;">bla</span></p>
<p>
    <span style="color: rgb(105, 105, 105); font-family: verdana, arial, sans-serif; font-size: 13px;"><img alt="" src="/functions/2.jpg" style="width: 400px; height: 500px;" /></span></p>
<p>
    <span style="color: rgb(105, 105, 105); font-family: verdana, arial, sans-serif; font-size: 13px;">bla</span></p>
' ,
      'col4' =>  '<br />
' );

 $fp = fopen("sample.csv", "wb");

$question_list_cols=array('col1','col2','col3','col4');

fputcsv($fp, $question_list_cols);
if (!fputcsv($fp, array_values($a))) {
        echo "fail<br />";
    }

fclose($fp);
?>

When I read sample.csv in R df<-read.csv("sample.csv",header=TRUE), I got error invalid input found on input connection.
I tried similar questions in SO, but no one is workable.

The problem caused by Chinese characters 版权. Everything is OK when I remove these Chinese Characters.

How to read utf8 csv with Chinese character in R?

kittygirl
  • 2,255
  • 5
  • 24
  • 52
  • Try `fileEncoding="latin1"` as a parameter in your `df<-read.csv()`. – Tom Udding Apr 11 '19 at 09:26
  • @TomUdding,tried,got error`1: In read.table(file = file, header = header, sep = sep, quote = quote, : invalid input found on input connection 'sample.csv' 2: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'sample.csv'` – kittygirl Apr 11 '19 at 09:30
  • Try to look here (https://stackoverflow.com/questions/46996501/readrread-csv-issue-chinese-character-becomes-messy-codes) mainly the comments – DJV Apr 11 '19 at 09:39
  • @DJV,doesn't work. – kittygirl Apr 11 '19 at 10:01
  • Did you try `df<-read.csv("sample.csv",header=TRUE, encoding = "UTF-8")` ? – Theo Apr 11 '19 at 13:22
  • @Theo,but this change Chinese characters to `` – kittygirl Apr 11 '19 at 13:27
  • @Theo,I open `sample.csv` by sublime,confirm it's UTF8 format.but `df<-read.csv("sample.csv",header=TRUE, fileEncoding = "UTF-8",encoding = "UTF-8")`will get error too.what's the problem of `fileEncoding = "UTF-8"`? – kittygirl Apr 11 '19 at 13:33
  • Have you saved your code in UTF-8 encoding aswell? – Theo Apr 11 '19 at 13:38
  • yes,confirmed format is UTF-8 – kittygirl Apr 11 '19 at 14:01

0 Answers0