-1

I use the following python code to read a CSV with 50K rows. Every row has a 4 digit code, for example '1234'.

import csv
import pandas as pd
import re

df = pd.read_csv('Parkingtickets.csv', sep=';',encoding='ISO-8859-1')

df['Parking tickets']

I would like to sort the code and get the count of the top 5 occurrence of the same code.

codes = df['Parking tickets']
Counter(codes).most_common(5)

With this is got kind of what I'm looking for, but it doesn't count only the digit codes and some may have two codes in the same row. How can I use "re.findall(r'\d{4}')"? I know I need to use it, but don't understand how to implement it.

Sumsum
  • 1
  • 1
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Dec 10 '17 at 15:35
  • Please post what have you tried for sorting a dataframe ? Refer this https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html – Tanu Dec 10 '17 at 15:45
  • Modified the content for clarity. Also added the python tag to get the attention. – thewaywewere Dec 10 '17 at 19:04
  • Show me `.head()` of the input and `.head()` of the desired output, and I'll help you. – Tom Wojcik Dec 10 '17 at 19:23

1 Answers1

1

Perhaps look at pandas.Series.value_counts() (http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.Series.value_counts.html). This returns a series containing the counts of the unique values in the original series. Here is some trivial example code:

import pandas as pd
list1 = [1, 1, 1, 2, 2, 3]

df = pd.DataFrame(data={
'number': list1})

df['number'].value_counts()

This returns

2  3
1  2
3  1

Indicating that the number 2 occurred 3 times, the number 1 occurred 2 times, and the number 3 occurred 1 time. You could always do:

top5 = list(df['number'].value_counts())
top5 = top5[:5]

Or a dictionary, etc.

mcp46
  • 11
  • 1