How to extract specific portion of CSV file using row key with pandas

Question

I have a huge CSV file with 10000 rows and 500 columns. I want to extract the data from header to the row which contains device_boot. I want to eliminate all rows after device_boot.

Example:

Name,Time,status,..
start,05:06:2018 10:10:23,good,..
start,05:06:2018 10:11:23,good,..
failure,05:06:2018 11:10:25,critical,..
device_boot,05:06:2018 13:11:25,reboot,..
start,05:06:2018 13:13:23,good,..
start,05:06:2018 13:16:23,good,..

So, I need to maintain in CSV file up to device_boot line (row) using pandas. I am able to remove the specific row on that keyword, but not able to extract up to that portion using pd.drop(...).

Thanks for the suggestions.

Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. — jezrael, Dec 12 '18 at 11:46
why dont you record the index number, that would be things easier i think — iamklaus, Dec 12 '18 at 11:52
Possible duplicate of [How can I filter lines on load in Pandas read\_csv function?](https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function) — erncyp, Dec 12 '18 at 12:03
In pandas you can use ```nrows```. Something like: ```data = pd.read_csv("filename.csv", nrows=1312)```. Just find the row count, i would have used simple bash commands like ```wc`` or a simple for loop.. — Aaron_ab, Dec 12 '18 at 12:03
Yes,i know the row number and extract the rows upto keyword, but sometime not able to know the row number due to huge file. For getting rows based on index, I used, df = pd.read_csv(/home/ubuntu1/output.csv',low_memory=False, index_col=False, dtype='unicode8', nrows=590) — sundarr, Dec 12 '18 at 13:11

U13-Forward · Accepted Answer · 2018-12-13T09:15:38.200

1

Use:

print(df.loc[:df['Name'].gt('device_boot').idxmin()+1,:])

Output will be expected output.

Update:

print(df.loc[:df.index[df['Name']=='device_boot'].tolist()[-1],:])

And that contains the 'device_boot' row, if want to remove it:

print(df.loc[:df.index[df['Name']=='device_boot'].tolist()[-1]-1,:])

edited Dec 13 '18 at 09:15

answered Dec 12 '18 at 11:55

U13-Forward

69,221
14
89
114

1

yes,your answer do the filtering of the specific row only, but what my expectation is "how to extract rows upto device_boot word in the firstrow". In some csv files, device_boot will appear more than one time also, I need to extract all rows in the csv file upto last device_boot appear in first column. – sundarr Dec 12 '18 at 13:07

score 0 · Answer 2 · answered Dec 17 '18 at 13:21

I findout the index of the keyword,like

val = df.loc[df['name']=='device_boot'].index
print val

Then, use that row index and retrive only until that variable,

rowretrive_index = val1+50  // any extra rows can be added here.
print rowretrive_index

df1 = df.iloc[1:rowretrive_index]
df1.to_csv('/out.csv',',',dtype='unicode8')

Hope it will useful. Thanks, Sundar

How to extract specific portion of CSV file using row key with pandas

2 Answers2