Python Groupby part of a string

Question

I'm grouping a list of transactions by UK Postcode, but I only want to group by the first part of the postcode. So, UK post codes are in two parts, outward and inward, separated by a [space]. e.g. W1 5DA.

subtotals = df.groupby('Postcode').count()

Is the way I'm doing it now, the way I've thought about doing it at the moment is adding another column to the DataFrame with just the first word of the Postcode column, and then grouping by that... but I'm wondering if there's any easier way to do it.

Thank you

Are your postcodes delimited on space e.g. `'W1 5DA`? If so you can add a new column for the first part `df['town_code'] = df['Postcode'].str.split().str[0]` — EdChum, Feb 01 '17 at 12:35
Please provide [mcve]. Hate to read between the lines and figure out what you really meant. — Mohammad Yusuf, Feb 01 '17 at 12:35
I think you need `subtotals = df.groupby(df['Postcode'].str.split().str[0]).count()` — jezrael, Feb 01 '17 at 12:36

score 4 · Accepted Answer · edited Sep 27 '17 at 16:51

I think you need groupby by Series created by split by first space:

subtotals = df.groupby(df['Postcode'].str.split().str[0]).count()

Sample:

df = pd.DataFrame({'Postcode' :['W1 5DA','W1 5DA','W2 5DA']})
print (df)
  Postcode
0   W1 5DA
1   W1 5DA
2   W2 5DA

print (df['Postcode'].str.split().str[0])
0    W1
1    W1
2    W2
Name: Postcode, dtype: object

subtotals = df.groupby(df['Postcode'].str.split().str[0]).count()
print (subtotals)
          Postcode
Postcode          
W1               2
W2               1

Check also What is the difference between size and count in pandas?

Python Groupby part of a string

1 Answers1