0

I have 2 pandas DataFrames consisting of entries, with dates. I am trying to graph for each day, the number of entries that were made this day, on one graph with different colors:

def plotMultiple(dfList):
    for df in dfList:
        times = pd.DatetimeIndex(df.Date)
        grouped = df.groupby([times.year, times.month]).size()
        ax = grouped.plot(kind='line', x='Date', figsize=(50, 5))

df1 = pd.DataFrame({"Date":[
"21.11.2018 14:44",
"21.12.2018 14:43",
"22.12.2018 14:42",
"25.12.2018 14:51"]}
)
df1.head()

df2 = pd.DataFrame({"Date":[
"20.12.2018 14:44",
"21.12.2018 14:44",
"21.12.2018 14:43",
"22.12.2018 14:42",
"21.12.2018 14:43",
"22.12.2018 14:42",
"21.12.2018 14:43",
"22.12.2018 14:42",
"23.12.2018 14:51"]}
)

plotMultiple([df2,df1])

This works perfectly if I pass an array consisting of one dataframe, but as soon as there are multiple, issues happen. The graph doesn't start at the first entry, but at some random point:

Example

As can be seen, the graph neither starts nor ends at the first entry. How can I make it go from the 21th of November to the 25th of December? I am fine with using both pyplot and seaborn.

  • That's reasonable given your different dataframes have different time ranges. – Quang Hoang Apr 15 '19 at 18:54
  • How are the dataframes different? Can't you concat them? – Tim Apr 15 '19 at 18:55
  • @Tim I can merge them, but how do I then create seperate lines again? – Astronguem Apr 15 '19 at 20:11
  • @QuangHoang Now, is there a possibility to only display the lines where they exist, and where they don't have any data the line stops? – Astronguem Apr 15 '19 at 20:12
  • isn't it what you have now? – Quang Hoang Apr 15 '19 at 20:13
  • @QuangHoang Well no, they aren't in the right time, some time is cut off, and the labeling doesn't make sense to me. – Astronguem Apr 15 '19 at 20:17
  • Generally, it would act as you expected. It's hard to see what when wrong when actual data are not accessible. – Quang Hoang Apr 15 '19 at 20:21
  • You might want to read [mcve] and [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – ImportanceOfBeingErnest Apr 15 '19 at 20:40
  • @Astronguem you can add a ‘label’ column to each DataFrame to label them before concatenating, say column of 1’s for first DataFrame, column of 2’s, etc. then you can use seaborn and graph separate lines based on that ‘label’ column – Tim Apr 15 '19 at 22:37
  • @Tim that label column would disappear when I do GroupBy. Or should I do this before concat? – Astronguem Apr 16 '19 at 18:50
  • @ImportanceOfBeingErnest thank you, done it! – Astronguem Apr 16 '19 at 21:21
  • @Astronguem maybe it’s better to do groupby first, then add label and concat. This would avoid having to do fancy agg() stuff after groupby. – Tim Apr 16 '19 at 23:32

1 Answers1

0

Use Counter from collections to count all occurences of each date.

import pandas as pd
import os
import matplotlib.pyplot as plt

from collections import Counter

df = pd.DataFrame({"Date":[
"20.12.2018 14:44",
"21.12.2018 14:44",
"21.12.2018 14:43",
"22.12.2018 14:42",
"21.12.2018 14:43",
"22.12.2018 14:42",
"21.12.2018 14:43",
"22.12.2018 14:42",
"23.12.2018 14:51"]}
)

date_count = Counter(df['Date'])

plt.plot(date_count.keys(), date_count.values())
plt.xticks(rotation=45)

# Repeat with different dataset and use `plt.plot()` at the end

# plt.plot(date_count.keys(), date_count.values())
# plt.xticks(rotation=45)

plt.show()

enter image description here