0

I have a pandas dataframe that I need to pull specific rows out of and into a new dataframe. These rows are in a list that look something like this:[42 50 52 59 60 62]

I am creating the dataframe from a .csv file but as far as I can tell there is not a way to designate the row numbers when reading the .csv and creating the dataframe.

import pandas as pd 

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',index_col = False, header = None )

Here's a portion of the dataframe:

                    0
0      1 269 245 44 5
1      2 293 393 33 5
2     3 295 175 67 12
3      4 298 415 33 5
4    5 304 392 213 11
mattp
  • 65
  • 9
  • https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function will help. You will have to read the whole file and then filter out the rows of interest. – quest Jun 26 '19 at 08:12

3 Answers3

4

Use skiprows with a callable:

import pandas as pd

keep_rows = [42 50 52 59 60 62]

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35', 
                 header=None
                 skiprows=lambda x: x not in keep_rows)
Chris
  • 29,127
  • 3
  • 28
  • 51
1

Unfortunately, pandas read_cvs expects a true file, and not a mere line generator, so it is not easy to select only a bunch of lines. But you can to that at Python level easily:

lines = [line for i, line in enumerate(open('/Users/uni/Desktop/corrindex+id/rt35'), 1)
         if i in [42 50 52 59 60 62]]
df = pd.read_csv(io.StringIO(''.join(lines)),index_col = False, header = None )

You can also use skiprows to ignore all the lines except the ones to keep:

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',index_col = False, 
                 header = None, skiprows=lambda x: x not in [42 50 52 59 60 62])
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Thank you for this, unfortunately I'm getting an error when I'm trying: Traceback (most recent call last): File "/Users/uni/Desktop/corrindex+id/Pandas to createxyt.py", line 6, in df = pd.read_csv(io.StringIO(''.join(lines)),index_col = False, header = None ) TypeError: initial_value must be unicode or None, not str – mattp Jun 26 '19 at 08:21
  • @mattp: If you are using Python 2.x, you must use `io.BytesIO` instead of `io.StringIO`. – Serge Ballesta Jun 26 '19 at 08:23
0

You can go about it like this:

import pandas as pd

my_list = [42, 50, 52, 59, 60, 62] 

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',
                 index_col= False,
                 header=None,
                 nrows=max(my_list) + 1).iloc[mylist]
zipa
  • 27,316
  • 6
  • 40
  • 58