How to skip certain rows of numerous CSV files by python pandas&csv?

Question

I have put numerous CSV files in a fold and would like to skip the certain row (e.g. the 10th row) first, and then take one row every five lines.
I could do the first step however have no idea about the second one.

Thanks.

import pandas as pd
import csv, os


# Loop through every file in the current working directory.
for csvFilename in os.listdir('path'):
    if not csvFilename.endswith('.csv'):
        continue
    # Now let's read the dataframe
    # total row number
    total_line = len(open('path' + csvFilename).readlines())
    # put the first and last to a list
    line_list = [total_line] + [1]
    df = pd.read_csv('path' + csvFilename, skiprows=line_list)
    new_file_name = csvFilename

    # And output
    df.to_csv('path' + new_file_name, index=False)

The correct code is shown as follows.

import numpy as np
import pandas as pd
import csv, os

# Loop through every file in the current working directory.
for csvFilename in os.listdir('path'):
    if not csvFilename.endswith('.csv'):
        continue
    # Now let's read the dataframe
    total_line = len(open('path' + csvFilename).readlines())
    skip = np.arange(total_line)
    # skip 5 rows
    skip = np.delete(skip, np.arange(0, total_line, 5))
    # skip the certain row you would like, e.g. 10
    skip = np.append(skip, 10)
    df = pd.read_csv('path' + csvFilename, skiprows=skip)

    new_file_name = '2' + csvFilename
    # And output
    df.to_csv('path' + new_file_name, index=False)

Does this answer your question? [Select every nth row as a Pandas DataFrame without reading the entire file](https://stackoverflow.com/questions/53812094/select-every-nth-row-as-a-pandas-dataframe-without-reading-the-entire-file) — Shaido, Apr 29 '20 at 09:11
You can [edit] the question if you want to add something, or if you have an answer you can add that (it's fine to answer your own question). If the question I linked answered your question, you can accept the duplicate. :) — Shaido, Apr 29 '20 at 09:37
Thank you for your help. I have updated my code, however, there are still some problems. — Neil, Apr 29 '20 at 09:40
No problems. `skip` contains the rows you want to skip so you need to remove the lines `np.delete(skip, total_line-1, 0)` and `np.delete(skip, 1, 0)`. For the last one, you should probably start from 1: `np.delete(skip, np.arange(1, total_line, 5))`. For the last line, you need to make sure it is in the `skip` list or you can use the `skipfooter` parameter in `read_csv`. — Shaido, Apr 29 '20 at 09:47
Thanks. How about if skipping a certain row? e.g. the fifth row? — Neil, Apr 29 '20 at 09:56
For that you still have to rely on `skiprows` as in the linked question / your updated answer. — Shaido, Apr 29 '20 at 10:02

Mo Huss · Answer 1 · 2020-04-29T11:01:23.087

1

You can use a function with skiprows.

I edited your code below:

    import numpy as np  
    import csv, os  

    # Loop through every file in the current working directory.
    for csvFilename in os.listdir('path'):
        if not csvFilename.endswith('.csv'):
            continue
        # Now let's read the dataframe
        total_line = len(open('path' + csvFilename).readlines())

        df = pd.read_csv('path' + csvFilename, skiprows=lambda x: x in list(range(total_line))[1:-1:5])

        new_file_name = csvFilename
        # And output
        df.to_csv('path' + new_file_name, index=False)

edited Apr 29 '20 at 11:01

answered Apr 29 '20 at 10:01

Mo Huss

434
2
11

There is something wrong. If I do so, it would skip what I really want. – Neil Apr 29 '20 at 11:59
you can change this "[1:-1:5]" part of the code to either "[1:-1:6]" or change it to "[1:-1:4]" and you will get exactly what you want. – Mo Huss Apr 29 '20 at 14:13

How to skip certain rows of numerous CSV files by python pandas&csv?

1 Answers1