0

Hey I could really use help here. I've tried for 1 hour to find a solution for python but was unable to find it.

I am using Python 3.7 My input is a file provided by a customer - I cannot change it. It is structured in the following way: It starts with random text not in CSV format and from line 3 on the rest of the file is in csv format.

text line

text line

text line or nothing

Enter

[Start of csv file] "column Namee 1","column Namee 2" .. until 6

"value1","value2" ... until 6 - continuing for many lines.

I wanted to extract the first 3 lines to create a pure CSV file but was unable to find code to only do it for a specific line range. It also seems the wrong solution as I think starting to read from a certain point should be possible. Then I thought split () is the solution but it did not work for this format. The values are sometimes numbers, dates or strings. You cannot use the seek() method as they start differently. Right now my dictreader takes the first line as an index and consequently the rest is rendered in chaos.

import csv

import pandas as pd

from prettytable import PrettyTable

with open(r'C:\Users\Hans\Downloads\file.csv') as csvfile:

    csv_reader = csv.DictReader (r'C:\Users\Hans\Downloads\file.csv', delimiter=',')

    for lines in csvfile:

        print (lines)

If some answer for python has been found please link it, I was not able to find it. Thank you so much for your help. I really appreciate it.

Juan
  • 1,520
  • 2
  • 19
  • 31
Hans
  • 1
  • 1
    why not open the file normally with open, then call readlines 3 times then pass the file handle to dictreader to eat the rest of the file ? – Chris Doyle Dec 16 '19 at 16:44
  • 1
    Maybe you can try using pandas. You can check pd.read_csv, and set the skip_rows = 3. It will start reading your file from there – Ivan Calderon Dec 16 '19 at 16:47
  • @IvanCalderon Thanks for your comment but then it skips the rows not the lines. Pandas has no argument for lines to ignore as far as I can see on the documentation. – Hans Dec 16 '19 at 17:31
  • Could you clarify the difference between rows and lines? – Ivan Calderon Dec 16 '19 at 18:40
  • A row is a vertical alignment like in excel A B C so going from top to bottom. A line is a horizontal allignment like in excel 1 2 3 so going from left to right if you will. When using skiprow it will see the first line has only one item and then adjust the rest of the file regardless. Probably I oversee something but when trying it it gave no solution to my problem. – Hans Dec 17 '19 at 20:54
  • @Hans I think you are confused, a vertical alignment is called a column. Generally, rows and lines are the same thing. Pandas has methods to skip both of them (vertical or horizontal) – Ivan Calderon Dec 18 '19 at 14:53
  • Oh I feel so bad. Of course you are right. I meant I want the first 3 rows gone. Pandas Dictreader thougth there is only one column as it takes the first line as its base. Thank you for your point. – Hans Dec 27 '19 at 07:46

2 Answers2

1

I will insist with the pandas option, given that the documentation clearly states that the skiprows parameter allows to skip n number of lines. I tried it with the example provided by @Chris Doyle (saving it to a file named line_file.csv) and it works as expected.

import pandas as pd
f = pd.read_csv('line_file.csv', skiprows=3)

Output

  name  num symbol
0 chris   4      $
1 adam    7      &
2 david   5      %
Ivan Calderon
  • 580
  • 6
  • 14
  • I agree with you @Ivan given that the OP gas the import pandas statement if they are planning to just load the rest of the csv as a pandas dataframe then this answer is much cleaner. – Chris Doyle Dec 17 '19 at 21:26
  • Ok sorry for the late reply. I tried this code and indeed worked. Now I have different problems but those are already discussed - although not solved in https://stackoverflow.com/questions/38336501/error-while-reading-a-csv-file-in-python-using-pandas – Hans Dec 27 '19 at 08:32
  • @Hans I'm happy that it worked. Could you please considering voting up the question. – Ivan Calderon Dec 27 '19 at 14:03
  • "Thanks for the feedback! Votes cast by those with less than 15 reputation are recorded, but do not change the publicly displayed post score" – Hans Dec 28 '19 at 17:17
0

If you know the number of lines you want to skip then just open the file and read that many lines then pass the filehandle to Dictreader and it will read the remaining lines.

import csv
skip_n_lines = 3
with open('test.dat') as my_file:
    for _ in range(skip_n_lines):
        print("skiping line:", my_file.readline(), end='')
    print("###CSV DATA###")
    csv_reader = csv.DictReader(my_file)
    for row in csv_reader:
        print(row)

FILE

this is junk
this is more junk
last junk
name,num,symbol
chris,4,$
adam,7,&
david,5,%

OUTPUT

skiping line: this is junk
skiping line: this is more junk
skiping line: last junk
###CSV DATA###
OrderedDict([('name', 'chris'), ('num', '4'), ('symbol', '$')])
OrderedDict([('name', 'adam'), ('num', '7'), ('symbol', '&')])
OrderedDict([('name', 'david'), ('num', '5'), ('symbol', '%')])
Chris Doyle
  • 10,703
  • 2
  • 23
  • 42
  • Thank you for yoir comment. I will try it on Thursday or saturday and come back to your comment. Sorry for the delay. – Hans Dec 17 '19 at 20:56
  • The pandas solution is cleaner in my beginner view. I decided therefore to stick with it. Thank you for your comment and time :) – Hans Dec 27 '19 at 08:33