How to Delete First Few Rows of .txt File in Python?

Question

I have a .txt file which looks like:

# Explanatory text
# Explanatory text
# ID_1 ID_2
10310   34426
104510  4582343
1032410 5424233
12410   957422

In the file, the two IDs on the same row are separated with tabs and the tab character is encoded as '\t'

I'm trying to do some analysis using the numbers in the dataset so want to delete the first three rows. How can this be done in Python? I.e. I'd like to produce a new dataset that looks like:

10310   34426
104510  4582343
1032410 5424233
12410   957422

I've tried the following code but it didn't work:

f = open(filename,'r')
lines = f.readlines()[3:]
f.close()

It doesn't work because I get this format (a list, with \t and \n present), not the one I indicated I want above:

[10310\t34426\n', '104510\t4582343\n', '1032410\t5424233\n' ... ]

You might want to say a little more than " it didn't work ", — , Oct 18 '20 at 16:20
What didn't work? What output did you get? What did you expect? — Joooeey, Oct 18 '20 at 16:22
if you are using `pandas` for your EDA, there is a `skiprows` parameter in pandas.read_csv , `pandas.read_csv(filepath_or_buffer, delimiter='\t,skiprows=2)` — Shijith, Oct 18 '20 at 16:23
@Mike67 I've edited the question to include the output to show you why this doesn't work in the way I'd like. Can you tell why? — , Oct 18 '20 at 16:57
Your sample output looks like debugger output. At the end of your code, add `print(lines[:10])` to see the first 10 lines in the console. The `\t` and `\n` should be correctly displayed. — Mike67, Oct 18 '20 at 17:10
Does this answer your question? [Parsing a tab-delimited .txt into a Pandas DataFrame](https://stackoverflow.com/questions/60571932/parsing-a-tab-delimited-txt-into-a-pandas-dataframe) — Tomerikoo, Oct 20 '20 at 14:16

Mario Rojas · Answer 1 · 2020-10-20T14:12:28.930

0

You Can Try Something Like this

with open(filename,'r') as fh

    for curline in fh:

         # check if the current line
         # starts with "#"

         if curline.startswith("#"):
            ...
            ...
         else:
            ...
            ...

edited Oct 20 '20 at 14:12

answered Oct 18 '20 at 16:21

Mario Rojas

136
1
7

score 0 · Answer 2 · edited Oct 20 '20 at 14:17

0

You can use Python's Pandas to do these kind of tasks easily:

import pandas as pd

pd.read_csv(filename, header=None, skiprows=[0, 1, 2], sep='\t')

edited Oct 20 '20 at 14:17

Tomerikoo

18,379
16
47
61

answered Oct 18 '20 at 16:25

aman nagariya

174
9

This answer is misleading... he is going to think pndas is a file manager library – adir abargil Oct 18 '20 at 16:31

Peyman Majidi · Answer 3 · 2020-10-20T15:35:08.293

Ok, here is the solution:

with open('file.txt') as f:
    lines = f.readlines()

lines = lines[3:]

Remove Comments

This function remove all comment lines

def remove_comments(lines):
    return [line for line in lines if line.startswith("#") == False]

Remove n number of top lines

def remove_n_lines_from_top(lines, n):
    if n <= len(lines):
        return lines[n:]
    else:
        return lines

Here is the complete source:

with open('file.txt') as f:
    lines = f.readlines()


def remove_comments(lines):
    return [line for line in lines if line.startswith("#") == False]

def remove_n_line(lines, n):
    return lines[n if n<= len(lines) else 0:]

lines = remove_n_lines_from_top(lines, 3)

f = open("new_file.txt", "w+") # save on new_file
f.writelines(lines)
f.close()

How to Delete First Few Rows of .txt File in Python?

3 Answers3

Remove Comments

Remove n number of top lines