2

I have many csv files with a different size containing tick data for some symbols. Here is an image of one sample file. enter image description here

Everything is in one columns seprated by ';'. I want to read the data with second row as header and skipping the first row. Till this time I have tried evrything that I can find out regarding loading the csv file while skipping the first row and using the second row as header. Here are some of my code snippet that I tired

df = pd.read_csv(cwd + folder + name +'.csv',delimiter=';', skip_blank_lines=True, encoding='utf-8', skiprows=[0])

another is like this

df = pd.read_csv(cwd + folder + name +'.csv',delimiter=';', encoding='utf-8', skiprows=[0], header=1)

and the output of all of these are with single column named 'Unnamed: 0' with all the values in dataframe as NaN. I have tried different solutions like

Python Pandas read_csv skip rows but keep header but none of them worked for me. If I do not skip the first row and read the file without any delimiter then it gives unicodeerror in Python. How to solve this problem?

After trying two solution in first two answers this is my output for both codes

enter image description here

Urvish
  • 643
  • 3
  • 10
  • 19
  • can you do `print(cwd+folder+name+'.csv')` and share the output. – meW Dec 27 '18 at 06:56
  • that is path to the file. If it is wrong then it throws error directly. But it is something like this `E:/pirimid/trader/LEAD.csv` – Urvish Dec 27 '18 at 06:57
  • @Urvish - the code that you have used itself gives me a correct dataframe as you expect. Very sure the error is with the file read. – Jim Todd Dec 27 '18 at 06:57
  • @Urvish are you getting some output (not NAN) if you read it as table `pd.read_table('E:/pirimid/trader/LEAD.csv')` – meW Dec 27 '18 at 06:59
  • Here is the exact output of `print(cwd+folder+name+'.csv'` `E:\Pirimid\trader data/Tick Files/LEADAPR.csv` – Urvish Dec 27 '18 at 07:01
  • @Urvish path is correct, try reading it as table. Are you getting any NaN? – meW Dec 27 '18 at 07:03
  • Yes still it gives NaN in all rows. I am trying to upload the data file but it is too large so taking some time. Will add the link to that in question once uploaded. – Urvish Dec 27 '18 at 07:05
  • added link for data file – Urvish Dec 27 '18 at 07:10
  • Uploading 1-2 MB file size would do it, not 100 MB. – meW Dec 27 '18 at 07:11
  • Yeah, but that will change the file itself. So if there is anything wrong with the file then I won't be able to know. That is why I uploaded the whole file. – Urvish Dec 27 '18 at 07:13

2 Answers2

2

in skiprows you need to give number of rows you want to skip from the top of your csv

use utf-16

df = pd.read_csv(cwd + folder + name +'.csv',delimiter=';', encoding='utf-16', skiprows=1)

for more info:

To check the encoding i have checked in libreoffice. if you open with libreoffice in its starting window you can choose delimiter, in which it also shows utf encoding of that file.

Nihal
  • 5,262
  • 7
  • 23
  • 41
0

@Urvish - I have used the same code that you have used in your post,and see that the output is exact. pls check your file.

import pandas as pd
df = pd.read_csv("C:\\Users\\user\\Downloads\\sof.csv" ,delimiter=';', skip_blank_lines=True, encoding='utf-8', skiprows=[0])
print(df)

Output:

            Date    bid    ask  last  volume
2017 06 05   799  149.6  149.7   0.0     0.0
2017 06 05   799  149.6  149.7   0.0     0.0
Jim Todd
  • 1,488
  • 1
  • 11
  • 15