I have a dataframe which has been reduced to a single column called Filename (already sorted in order) which contains a list of filenames which may or may not repeat themselves.
For example
Filename
/dir1/dir2/abc.jpg
/dir1/dir2/abc.jpg
/dir1/dir2/def.jpg
/dir1/dir2/hij.jpg
/dir1/dir2/hij.jpg
/dir1/dir2/hij.jpg
/dir1/dir2/hij.jpg
/dir1/dir2/hij.jpg
/dir1/dir2/klm.jpg
/dir1/dir2/klm.jpg
Using python 3.6 and pandas I’m trying to obtain for each file name the number of incidences The output should be a dataframe ,an example is shown below
Filename Instances
/dir1/dir2/abc.jpg 2
/dir1/dir2/def.jpg 1
/dir1/dir2/hij.jpg 5
/dir1/dir2/klm.jpg 2
I’ve worked out a way to do this by converting to a list and then counting, however I’d like to keep this as a dataframe as its going to be pumped back into some machine learning, and converting to and from a list,then back again appears to be a poor route to take
I’ve tried code like
df = df.groupby('FileName')
df.groupby(['FileName']).count()
df = df.groupby('FileName').nunique()
but none appear to work. The data frame has been defined in the past with 15 columns, and they have been deleted with code like
df = df.drop(['Column1Name', 'Column2Name',], axis=1)
The above example only deletes 2 columns (for simplicity) , but in real life there are 14 entered so, I’m wondering if this or the fact I have not identified a new column called Quantity (to store the quantities counted), has anything to do with it.
Any help would be much appreciated