Reading specific rows out of a panda dataframe using a list

Question

I have a pandas dataframe that I need to pull specific rows out of and into a new dataframe. These rows are in a list that look something like this:[42 50 52 59 60 62]

I am creating the dataframe from a .csv file but as far as I can tell there is not a way to designate the row numbers when reading the .csv and creating the dataframe.

import pandas as pd 

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',index_col = False, header = None )

Here's a portion of the dataframe:

                    0
0      1 269 245 44 5
1      2 293 393 33 5
2     3 295 175 67 12
3      4 298 415 33 5
4    5 304 392 213 11

https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function will help. You will have to read the whole file and then filter out the rows of interest. — quest, Jun 26 '19 at 08:12

Chris · Answer 1 · 2019-06-26T12:18:11.977

4

Use skiprows with a callable:

import pandas as pd

keep_rows = [42 50 52 59 60 62]

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35', 
                 header=None
                 skiprows=lambda x: x not in keep_rows)

edited Jun 26 '19 at 12:18

answered Jun 26 '19 at 08:21

Chris

29,127
3
28
51

Serge Ballesta · Answer 2 · 2019-06-26T08:22:02.053

1

Unfortunately, pandas read_cvs expects a true file, and not a mere line generator, so it is not easy to select only a bunch of lines. But you can to that at Python level easily:

lines = [line for i, line in enumerate(open('/Users/uni/Desktop/corrindex+id/rt35'), 1)
         if i in [42 50 52 59 60 62]]
df = pd.read_csv(io.StringIO(''.join(lines)),index_col = False, header = None )

You can also use skiprows to ignore all the lines except the ones to keep:

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',index_col = False, 
                 header = None, skiprows=lambda x: x not in [42 50 52 59 60 62])

edited Jun 26 '19 at 08:22

answered Jun 26 '19 at 08:17

Serge Ballesta

143,923
11
122
252

Thank you for this, unfortunately I'm getting an error when I'm trying: Traceback (most recent call last): File "/Users/uni/Desktop/corrindex+id/Pandas to createxyt.py", line 6, in df = pd.read_csv(io.StringIO(''.join(lines)),index_col = False, header = None ) TypeError: initial_value must be unicode or None, not str – mattp Jun 26 '19 at 08:21
@mattp: If you are using Python 2.x, you must use `io.BytesIO` instead of `io.StringIO`. – Serge Ballesta Jun 26 '19 at 08:23

score 0 · Answer 3 · answered Jun 26 '19 at 08:14

You can go about it like this:

import pandas as pd

my_list = [42, 50, 52, 59, 60, 62] 

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',
                 index_col= False,
                 header=None,
                 nrows=max(my_list) + 1).iloc[mylist]

Reading specific rows out of a panda dataframe using a list

3 Answers3