0

I'm reading data from a .csv file using pandas. I'm using sep = ', \ s *' because sep = ',' not working to specify columns. My .csv file :

tarih_x,Alt_urun,per_geomean
2018-07-13,Antep fıstığı-Açık,81.87336164596796
2018-07-14,Antep fıstığı-Açık,81.87336164596796
2018-07-15,Antep fıstığı-Açık,81.87336164596796
2018-07-16,Antep fıstığı-Açık,81.87336164596796
2018-07-17,Antep fıstığı-Açık,81.87336164596796

I'm reading data:

path = "data//gün_result_index.csv"
df = pd.read_csv(path,encoding='utf-16',sep=',\s*',engine='python')

but when I print df, I see that the method adds double quotes to the data:

|   | "tarih_x    | Alt_urun           | per_geomean"       |
|---|-------------|--------------------|--------------------|
| 0 | "2018-07-13 | Antep fıstığı-Açık | 81.87336164596796" |
| 1 | "2018-07-14 | Antep fıstığı-Açık | 81.87336164596796" |
| 2 | "2018-07-15 | Antep fıstığı-Açık | 81.87336164596796" |
| 3 | "2018-07-16 | Antep fıstığı-Açık | 81.87336164596796" |

This is not something I want. How can I read data without double quotes?

Uğur Eren
  • 163
  • 3
  • 11
  • Maybe try `pd.read_csv(path, encoding='itf-16', sep=',\s*', quotechar='"', engine='python')` – Erfan Oct 20 '19 at 10:10
  • I think this might be problem with encodings, which was tampered with when you copy&pasted the file content here. Can you please share the original file as base64? – Ente Oct 20 '19 at 10:11
  • @Erfan Unfortunately, it didn't work. – Uğur Eren Oct 20 '19 at 10:47
  • @Ente data here: https://postamuedu-my.sharepoint.com/:x:/g/personal/birdalugureren_posta_mu_edu_tr/ERD-L_RKZbZDmV9sKdfEd1YBDG8h5D573GuNDdB1IfngXQ?rtime=h5pxhEpV10g – Uğur Eren Oct 20 '19 at 10:48
  • @UğurEren: That's an excel spreadsheet and does not help. Sorry. Please provide the original `.csv` file, preferably as base64 encoded. Here on the page. – Ente Oct 20 '19 at 11:03
  • When I opened the file with notepad, I saw quotation marks.i found a workaround. – Uğur Eren Oct 20 '19 at 11:26

2 Answers2

0

For me df = pd.read_csv('file.csv') works just fine:

      tarih_x            Alt_urun  per_geomean
0  2018-07-13  Antep fıstığı-Açık    81.873362
1  2018-07-14  Antep fıstığı-Açık    81.873362
2  2018-07-15  Antep fıstığı-Açık    81.873362
3  2018-07-16  Antep fıstığı-Açık    81.873362
4  2018-07-17  Antep fıstığı-Açık    81.873362

But encoding='utf-16' gives:
UnicodeError: UTF-16 stream does not start with BOM

I use pandas 0.25.1 on Ubuntu.

Quant Christo
  • 1,275
  • 9
  • 23
0

There's probably a shorter way. The workaround I found is:

df.columns = df.columns.str.replace('\"', '')
for i in df.columns:
    df[i]= df[i].apply(lambda x: x.replace("\"",""))
Uğur Eren
  • 163
  • 3
  • 11