1

I would like to find the frequency of the pairing of each street name + cross name that appears in the data using pandas. For instance, here is a sample of the data:

Street Name        Cross Streets
Massachusetts Ave  Rindge Ave
Massachusetts Ave  NaN
Franklin Street    Sidney Street
Massachusetts Ave  Rindge Ave

How do I count the frequency of each pairing of street name and cross name and how can I also find the count of streets with no pairings (for instance the one with NaN)?

dancemc15
  • 598
  • 2
  • 7
  • 21

1 Answers1

1

Maybe something like:

Load library and import your data

In [1]: import pandas as pd

In [2]: df = pd.read_csv("test.csv", delimiter=",", na_values="NaN")    
In [3]: df
Out[3]:
        Street Name   Cross Street
0  Massachusetts Ave     Rindge Ave
1  Massachusetts Ave            NaN
2    Franklin Street  Sidney Street
3  Massachusetts Ave     Rindge Ave

"count the frequency of each pairing of street name and cross name"

... by grouping the street name and cross street and counting how many in each group

In [4]: df.groupby(['Street Name', 'Cross Street']).size()
Out[4]:
Street Name        Cross Street
Franklin Street    Sidney Street    1
Massachusetts Ave  Rindge Ave       2
dtype: int64

"find the count of streets with no pairings (for instance the one with NaN)"

... by grouping the street name and counting how many NaN is in cross street

In [5]: df.groupby("Street Name").agg(lambda x: x["Cross Street"].isnull().sum())
Out[5]:
                  Cross Street
Street Name
Franklin Street               0
Massachusetts Ave             1
csiu
  • 3,159
  • 2
  • 24
  • 26
  • Thanks for this! What's the difference between size() and count() though in pandas? – dancemc15 Oct 07 '16 at 04:13
  • @dancemc15 huh -- to be honest, I'm not too sure (I was referring to http://stackoverflow.com/questions/19384532/how-to-count-number-of-rows-in-a-group-in-pandas-group-by-object when I was answering the question) – csiu Oct 07 '16 at 04:18
  • @dancemc15 More googling: `count` computes count of group, excluding missing values (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.count.html) and `size` computes group sizes (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html); .... because there is NaN in the data, using count() would not work – csiu Oct 07 '16 at 04:25
  • Okay, understood. – dancemc15 Oct 07 '16 at 04:27