0

If anyone could help me with the below encoding problem I'd really appreciate it. I cannot read my csv file in Jupyter notebook using the below Python code.

import pandas as pd
pd.read_csv(my_csv, index_col = 0)

Error I'm getting:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

I'm getting similar problems when I import the file to PostgreSQL and try to use SQL to select columns so maybe it's something to do with the csv file itself.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
Tadhgo
  • 27
  • 9
  • 1
    Are you able to open the csv file in a text editor? If so, most modern ones (e.g. notepad++, vscode) will have the encoding in the status bar. Once you find out what it is, you can just add the `encoding="utf-16"` argument, or whatever the encoding actually is. – SamR Apr 17 '22 at 19:47
  • You should pass the correct encoding type to pd.read_csv(), as described [https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html](http://example.com) – mitlabence Apr 17 '22 at 19:48
  • you should try changing it to ```pd.read_csv(rf'{my_csv}', index_col=0)``` – Nin17 Apr 17 '22 at 19:52
  • Please post the full traceback error message so we can see context. – tdelaney Apr 17 '22 at 19:52
  • And please note that the program you posted can't work... the variable `my_csv` isn't defined, so it cannot produce the error you've shown. Show us code that really demonstrates the problem. Ideally you should include a small sample input data that we can test ourselves. – tdelaney Apr 17 '22 at 19:55
  • @SamR I can open it in Notepad no problem but I don't see the encoding anywhere! – Tadhgo Apr 17 '22 at 19:56
  • One way you can get this error is with something like `pd.read_csv("C:\Users\Foo\my_csv.csv", index_col = 0)` and you get the syntax error when you try to run the module, even if its in a function that hasn't been called. That is made clear by the traceback message, and of course a simple program demonstrating the problem. So, please, give us the full information. – tdelaney Apr 17 '22 at 20:03
  • @tdelaney the filepath is quite long so i just typed my csv instead, my code is "pd.read_csv('filepath', index_col = 0)". – Tadhgo Apr 17 '22 at 20:06
  • @tdelaney (i've replaced my actual file path with 'filepath.csv' below as its very long. Here is the full error message: File "C:\Users\tadhg\AppData\Local\Temp/ipykernel_26296/1056660807.py", line 1 pd.read_csv('filepath.csv', index_col = 0) ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape" – Tadhgo Apr 17 '22 at 20:08
  • Really? But that is likely the part that is broken. Python string literals use the backslash as an escape. If you have `\U` then python thinks you are about to enter a unicode escape sequence. You could change, for instance `"C:\Users\foo\my_csv"` to `r"C:\Users\foo\my_csv"` ( notice the "r" at the front - its a "raw" string that doesn't use backslash escapes) or use the unix convention `"C:/Users/foo/my_csv"` (forward slashes) even on Windows. – tdelaney Apr 17 '22 at 20:10
  • @bemitlas Would you mind elaborating please? I know the basics of Pythin, R and SQL but know very little about encoding erros and how to fix them and why they occur. how do I find out what the "correct encoding" is? – Tadhgo Apr 17 '22 at 20:11
  • Now wait, I'm really confused. You won't show the file path, ..., but then you show "filepath.csv"? is that literally the file name that you are using? Why not post the file path? – tdelaney Apr 17 '22 at 20:12
  • @tdelaneyokay thanks, we might be getting somewhere. will it be able to locate my file if i replace backslashes with front slashes? Trying it now – Tadhgo Apr 17 '22 at 20:12
  • @bemitlas assumes that the problem happens inside pandas `read_csv` function, but I don't think that's the case. I think its the literal string encoding of the filename in your source. I think you could demonstrate the same problem with a single lined python test file: `file_name = "C:\Users\tadhg\test.csv"` (or whatever your filename is). And really, the file name is not too long to copy/paste into the question. – tdelaney Apr 17 '22 at 20:16
  • @tdelaney Sorry for confusion, I was reluctant as it had my college name etc. I've moved it. Saem error with new filepath: "pd.read_csv('C:\Users\tadhg\Documents\datasets\transfers1.csv', index_col = 0)" – Tadhgo Apr 17 '22 at 20:16
  • Okay, but any dummy test name will do. In your example, just make it a raw string with an "r" on the front: `r'C:\Users\tadhg\Documents\datasets\transfers1.csv'`. – tdelaney Apr 17 '22 at 20:17
  • @tdelaney thanks a mill, so simple in the end. I would've known that a few weeks ago but I'm out of practice. Feel free to post an answer and I will click it as correct – Tadhgo Apr 17 '22 at 20:19
  • This is a common question, so I'll find an existing answer and duplicate. – tdelaney Apr 17 '22 at 20:21

0 Answers0