0

Okay simple problem. I used the count function in pandas to count the number of samples and group them by cruise and year as can be seen below:

eDNADFdateCruise = eDNADF.groupby(['year', 'cruise_id'])['sample_name'].count()

However, now what I want to do is count the number of null values in a given column (such as extraction_no) and group those in the same way.

I tried this:

eDNADFBlanksSummary = eDNADF.groupby(['year', 'cruise_id'])df['extraction_no'].isna().sum()

Which I thought would count the number of null values for the extraction_no column and group them by cruise and year. However, this simply returned a syntax error so I am a bit lost.

Timus
  • 10,974
  • 5
  • 14
  • 28
  • Does this help: [pandas-count-null-values-in-a-groupby-function](https://stackoverflow.com/questions/43321455/pandas-count-null-values-in-a-groupby-function) BTW, you seem to have an extraneous `df` in your attempt (which explains the syntax error) – topsail Jun 26 '23 at 19:09
  • @topsail So I removed the extraneous df and it just says that seriesgroupby has no attribute isnull. The solution given in the link you posted doesn't really seem to be addressing this problem. Using that syntax I would be creating a dataframe that only has the index and the true values for the one column (extraction_no) but that is not what I want. I want it to group all rows of the dataframe that have a null value for one column, group them per year and cruise, and then count how many are in each. – Brandon Feole Jun 26 '23 at 20:03
  • You could filter first for all rows that have a null value in the column(s), then group and count by group. That was what I got from the linked answer (admittedly, didn't think look long and hard - but wouldn't that work?) – topsail Jun 26 '23 at 20:56
  • Check this out: (https://stackoverflow.com/q/45752601) – Zero Jun 27 '23 at 14:08
  • Please provide a [mre]. – Timus Jun 27 '23 at 14:15

0 Answers0