What I'm trying to do is delete several rows of an Excel-Files (with pandas) and then save the File without those rows to .xlsx (with pyexcelerate module).
I'm aware that I can remove rows of a data frame by dropping them (I already got that to work). But I have read in several posts that when there are many (in my case > 5000) rows that should be deleted it's much faster to just get the indexes of the "to delete" rows from the data frame and then slice the data frame (just as a SQL Except statement for example would do). Unfortunately I can't get it to work, even though I've tried several methods.
Here are my "source posts":
Slice Pandas dataframe by labels that are not in a list - Answer from User ASGM
How to drop a list of rows from Pandas dataframe? - Answer from User Dennis Golomazov
And here is a part of the function, that should delete the rows and save the created file:
for index, cell in enumerate(wb_in[header_xlsx]):
if str(cell) in delete_set:
set_to_delete.append(index)
print str(cell) + " deleted from set: " + str(len(set_to_delete))
wb_out = Workbook()
data_out = wb_in.loc[set(wb_in.index) - set(set_to_delete)]
ws_out = wb_out.new_sheet('Main', data=data_out)
wb_out.save(file_path + filename + "_2.xlsx")
Here is an example of the data frame:
sku product_group name \
0 ABCDb00610-23.0 ABA1 Anti
1 ABCDb00610-10.0 ABA1 Anti
2 ABCDb00610-1.1 ABA1 Anti
3 ABCDb00609-23.0 ABA1 Anti
4 ABCDb00609-10.0 ABA1 Anti
5 ABCDb00609-1.1 ABA1 Anti
6 ABCDb00608-23.0 ABA1 Anti
7 ABCDb00608-10.0 ABA1 Anti
8 ABCDb00608-3.3 ABA1 Anti
9 ABCDb00608-3.0 ABA1 Anti
Delete_set is a set that contains only skus (e.g.: ABCDb00608-3.3 or ABCDb00609-1.1).
Btw: I have tried many solution suggestions!
Thanks in advance!