I'm trying to get the following output from this df. It was constructed from a django query which was converted to a df:
messages = Message.objects.all()
df = pd.DataFrame.from_records(messages.values())
+---+-----------------+------------+---------------------+
| | date_time | error_desc | text |
+---+-----------------+------------+---------------------+
| 0 | 3/31/2019 12:35 | Error msg | Hello there |
| 1 | 3/31/2019 12:35 | | Nothing really here |
| 2 | 4/1/2019 12:35 | Error msg | What if I told you |
| 3 | 4/1/2019 12:35 | | Yes |
| 4 | 4/1/2019 12:35 | Error Msg | Maybe |
| 5 | 4/2/2019 12:35 | | Sure I could |
| 6 | 4/2/2019 12:35 | | Hello again |
+---+-----------------+------------+---------------------+
Output:
+-----------+-------------+--------+-----------------------------+--------------+
| date | Total count | Errors | Greeting (start with hello) | errors/total |
+-----------+-------------+--------+-----------------------------+--------------+
| 3/31/2019 | 2 | 1 | 1 | 50% |
| 4/1/2019 | 3 | 2 | 0 | 66.67% |
| 4/2/2019 | 2 | 0 | 1 | 0% |
+-----------+-------------+--------+-----------------------------+--------------+
I'm partially able to get there with the following code, but it seems a bit of a roundabout way of doing it. I am marking each with a 'Yes'/'No' based on if they meet conditions and then run a group by.
df['date'] = df['date_time'].dt.date
df['greeting'] = np.where(df["text"].str.lower().str.startswith('hello'), "Yes", "No")
df['error'] = np.where(df["error_desc"].notnull(), "Yes", "No")
df.set_index("date")
.groupby(level="date")
.apply(lambda g: g.apply(pd.value_counts))
.unstack(level=1)
.fillna(0)
This produces the counts, but in multiple yes/no columns.
I could do some manipulation after this point, but is there a more efficient way of coming up with the output I'm after?