-1

I would like to check a "worksheet" if it contains more than e.g. 250 entries if it does I would create a new excel-sheet and save it in a new file.

For example:

Leading-Zip:   Adresses that contains the Leading-Zip:
--------------------------

74                  400
73                  200   
72                   50

I used this command to get the number of entries I want to group:

worksheet['Zip-code-region'].value_counts()

Which way do I have to choose to make that? Do i have to create a list? or could I use a command with a for-loop?

Try a Update: I am importing a excelfile:

xel = pd.read_excel(r'C:test.xlsx', sheet_name = None)

than i select a sheet:

worksheet = xel[ws]

now I add a new column 'leading-zip' slicing the ZIP code:

worksheet['leading-zip']=worksheet['zip-code'].astype(str).str[:2].astype(int)

from that 'leading-zip' I want to iterate each 'leading-zip' - count the adresses contained in it and if they are more than 250 I want to create a new excel file.

elbu
  • 21
  • 6
  • 1
    please provide a [mcve](https://stackoverflow.com/help/mcve), also check [how to make good pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – anky Mar 14 '19 at 16:35
  • which version of pandas you have? `pd.__version__` ? if `0.24` check with `worksheet[worksheet.Zip-code-region.isin(worksheet.groupby('Zip-code-region')['Zip-code-region'].value_counts().loc[lambda x : x>250].droplevel(0).index)]` – anky Mar 14 '19 at 17:57
  • I installed the 0.24.1 one – elbu Mar 14 '19 at 18:09
  • 1
    hi anky, that worked - thanks. never in my life i would have come to this conclusion. now how do i write the selected dresses in a excel-file? i thought of using the leading-zip to add it in the file-name. like: i have more than 500 adresses in the leading-zip: 74 -> filename: zip-74.xlsx --- and another problem: it can be various of leading-zip's with over 500 adresses. – elbu Mar 14 '19 at 19:14
  • You already have the selection, now just export it to excel – anky Mar 15 '19 at 03:18

1 Answers1

1

You can filter the value_counts results that are above the threshold and then loop over their indexes, saving the respective subsets from the original DataFrame as separate Excel sheets:

import xlsxwriter
import numpy as np
import pandas as pd

df = pd.DataFrame({'zip': np.random.randint(10, 100, 1000)})

z = df['zip'].value_counts()

threshold = 15

writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
for i in z[z >= threshold].index:
    df[df['zip'] == i].to_excel(writer, str(i))

# save the remaining data as worksheet 'other':
df[df['zip'].isin(z[z < threshold].index)].to_excel(writer, 'other')

writer.save()
perl
  • 9,826
  • 1
  • 10
  • 22
  • No, of course you don't need to use random, I'm only generating some data here to demo how it works. Just use your own DataFrame instead – perl Mar 14 '19 at 17:21
  • Tried it - but it doesn't work. Error: NameError: name 'np' is not defined – elbu Mar 14 '19 at 17:46
  • You can fix it with `import numpy as np` (however I only use numpy here to generate randomized data) – perl Mar 14 '19 at 17:48
  • `numpy` not `lumpy`? – perl Mar 14 '19 at 18:08
  • So which error are you getting? I've added all required inputs to my code above (and tested with freshly restarted kernel), so you can try to execute the whole block – perl Mar 14 '19 at 18:16
  • Hi, now it works! the excel ist generated. :-) thanks. – elbu Mar 15 '19 at 07:19
  • now I tried it in my program - it works - but I forgot to say that I also need the rest of adresses (all that are under 250) all in one sheet. If you might help me? – elbu Mar 15 '19 at 08:13
  • Updated my example, see the line of code below "save the remaining data as worksheet 'other'" comment – perl Mar 15 '19 at 08:58