In python3 and pandas I have this dataframe:
autores_naodeputados.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 19 columns):
IdAutor 0 non-null object
IdDocumento 0 non-null object
NomeAutor 0 non-null object
codigo_unico 0 non-null object
nome_deputado 0 non-null object
uf 0 non-null object
nome_completo 0 non-null object
sequencial 0 non-null object
cpf 0 non-null object
nome_urna 0 non-null object
partido_eleicao 0 non-null object
situacao 0 non-null object
AnoLegislativo 0 non-null object
CodOriginalidade 0 non-null object
DtEntradaSistema 0 non-null datetime64[ns]
DtPublicacao 0 non-null datetime64[ns]
Ementa 0 non-null object
IdNatureza 0 non-null object
NroLegislativo 0 non-null object
dtypes: datetime64[ns](2), object(17)
memory usage: 0.0+ bytes
It is a database on authorship of legislative projects. Column "NomeAutor" is the name of the politician.
The column "NroLegislativo" is the sequential number that the project receives in the year.
The "CodOriginalidade" column has other given sequential code, not all project types.
The column "IdNatureza" is the code that indicates which type of process (law, amendment, etc.).
Column "AnoLegislativo" is the year the project was submitted.
With these four fields united (NroLegislativo, CodOriginalidade, IdNatureza, AnoLegislativo) I have a unique key that differentiates the projects, in each political name.
Is there a way to count how many unique keys each politician has? So, to know how many projects each person has.
-/-
A sample of the rows look like:
autores_projetos[['NomeAutor', 'NroLegislativo', 'CodOriginalidade', 'IdNatureza', 'AnoLegislativo']].head(5).to_dict()
{'NomeAutor': {0: 'Vaz de Lima',
1: 'Edmir Chedid',
2: 'Roberto Engler',
3: 'Campos Machado',
4: 'Célia Leão'},
'NroLegislativo': {0: '9', 1: '9', 2: '9', 3: '9', 4: '9'},
'CodOriginalidade': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'IdNatureza': {0: '5', 1: '5', 2: '5', 3: '5', 4: '5'},
'AnoLegislativo': {0: '2015', 1: '2015', 2: '2015', 3: '2015', 4: '2015'}}
I need to know something like:
NomeAutor
Gil Lancaster 386
Itamar Borges 200
Campos Machado 189
Carlos Giannazi 189
Cezinha de Madureira 165
Afonso Lobato 152
Mauro Bragato 149
...
The source is a groupby:
autores_deputados.groupby("NomeAutor").NroLegislativo.count().sort_values(ascending=False)
But as I said above in my case the unique key is made up of many fields