3

I have the same problem as explain in here ,the only difference is that the CSV file contain non_english string and I couldn't find any solution for it : when I read the csv file with out encoding it gives me no error but the data changed to :

network=read.csv("graph1.csv",header=TRUE)

  اشپیل(60*4)

and if I run the read.csv with fileEncoding it gives me this error:

 network=read.csv("graph1.csv",fileEncoding="UTF-8",header=TRUE)
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  invalid input found on input connection 'graph1.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'graph1.csv'

 network[1]
[1] X.
<0 rows> (or 0-length row.names)

system info :

windows server 2008
R:R3.1.2

sample file :

node1,node2,weight
ورق800*750*6,ورق 1350*1230*6mm,0.600000024
ورق900*1200*6,ورق 1350*1230*6mm,0.600000024
ورق76*173,ورق 1350*1230*6mm,0.600000024
ورق76*345,ورق 1350*1230*6mm,0.600000024
ورق800*200*4,ورق 1350*1230*6mm,0.600000024
Community
  • 1
  • 1
academic.user
  • 639
  • 2
  • 9
  • 28
  • possible duplicate of [columns names not read properly by read.csv in R](http://stackoverflow.com/questions/28005632/columns-names-not-read-properly-by-read-csv-in-r) – Colonel Beauvel Jan 28 '15 at 21:52
  • it just not the columns name the whole file is not reading properly. – academic.user Jan 28 '15 at 22:02
  • second warning should be solved according to my answer, due to the carriage return you forgot. – Colonel Beauvel Jan 28 '15 at 22:06
  • For what it’s worth, the example file works perfectly on Mac (assuming the system locale is set to UTF-8), and both code snippets work – it’s almost certainly a Windows-specific problem, because the Unicode support under R for Windows is quite frankly shabby. – Konrad Rudolph Jan 28 '15 at 22:11
  • you should have the below result by putting a EOD delimitor at the end (what I also called carriage return) – Colonel Beauvel Jan 28 '15 at 22:18

2 Answers2

2

I tried with your input this:

> read.csv("graph1.csv", encoding="UTF-8")
                      X.U.FEFF.node1                                  node2 weight
1  <U+0648><U+0631><U+0642>800*750*6 <U+0648><U+0631><U+0642> 1350*1230*6mm    0.6
2 <U+0648><U+0631><U+0642>900*1200*6 <U+0648><U+0631><U+0642> 1350*1230*6mm    0.6
3     <U+0648><U+0631><U+0642>76*173 <U+0648><U+0631><U+0642> 1350*1230*6mm    0.6
4     <U+0648><U+0631><U+0642>76*345 <U+0648><U+0631><U+0642> 1350*1230*6mm    0.6
5  <U+0648><U+0631><U+0642>800*200*4 <U+0648><U+0631><U+0642> 1350*1230*6mm    0.6
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
2

The following should work – mind you, I can’t test it since I don’t have Windows (and Windows, Unicode and R simply do not mix):

x = read.csv('graph1.csv', fileEncoding = '', stringsAsFactors = TRUE)

At this point, x is gibberish, since it was read as-is, without parsing the byte data into an encoding. We should be able to verify this:

Encoding(x[1, 1])
# [1] "unknown"

Now we tell R to treat it as UTF-8:

x = as.data.frame(lapply(x, iconv, from = 'UTF-8', to = 'UTF-8),
                  stringsAsFactors = FALSE)

These two steps can be compressed into one by using encoding instead of fileEncoding as the argument to read.csv:

x = read.csv('graph1.csv', encoding = 'UTF-8', stringsAsFactors = TRUE)

In either case, roughly the same process takes place.

At this point, x still appears as gibberish, since your terminal on Windows presumably does not support a Unicode code page which R understands. In fact, when running the code with a non-UTF-8 code page on Mac, I get the following output now:

x[1, 1]
# [1] "<U+0648><U+0631><U+0642>800*750*6"

However, at least the encoding is now correctly set, and the bytes are parsed:

Encoding(x[1, 1])
# [1] "UTF-8"

And if you pass the data to a device or program which speaks UTF-8, it should appear correctly. For instance, using the data as labels in a plot command should work.

plot.new()
text(0.5, seq(0, 1, along.with = x[, 1]), x[, 1])

plot output

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • thansk for the explanation, but after some analysis I need to write the data back to csv , but now I by just writing the file after `x = read.csv('graph1.csv', encoding = 'UTF-8', stringsAsFactors = TRUE)`. I will get the out put as it is in here, encoding not the real data. – academic.user Jan 28 '15 at 22:47
  • @academic.user Unfortunately at this point my lack of access to Windows prevents me from trying out what might work. But have you tried just writing the file? The data hasn’t been changed at all, so this might work. – Konrad Rudolph Jan 28 '15 at 23:30
  • .thanks, yes I tried `write.csv(network, file = "network.csv",row.names=FALSE)` after `network=read.csv("graph1.csv", encoding="UTF-8", header=TRUE)` and the output is :`"X.U.FEFF.node1","node2","weight" "800*750*6"," 1350*1230*6mm",0.600000024 "900*1200*6"," 1350*1230*6mm",0.600000024 ` – academic.user Jan 29 '15 at 00:23