I have a dataframe of "sentences", from which I wish to search for a keyword. Let's say that my keyword is just the letter 'A'. Sample data:
year | sentence | index
-----------------------
2015 | AAX | 0
2015 | BAX | 1
2015 | XXY | -1
2016 | AWY | 0
2017 | BWY | -1
That is, the "index" column shows the index of the first occurence of "A" in each sentence (-1 if not found). I want to group up the rows into their respective years, with a column showing the percentage of occurences of 'A' in the records of each year. That is:
year | index
-------------
2015 | 0.667
2016 | 1.0
2017 | 0
I have a feeling that this involves agg
or groupby
in some fashion, but I'm not clear how to string these together. I've gotten as far as:
df.groupby("index").count()
But the issue here is some kind of conditional count() first, where we first count the number of rows in year 201X containing 'A', then dividing that by the number of rows in year 201X.