0

I have a problem with grouping data and plotting in over time to show incremental change. The data structure is below in the incoming data and added to a pandas dataframe:

"DateTime","Classification", "Confidence"

What I want to do is show the unique values of classification and count how many times they occur every 5 minutes. I then want to plot this in a graph that will update every 5 minutes showing the incremental values over time.

I have tried different approaches but I just cant get my head around it. The dataframe I can get is:

Index class count
0 Car 2
1 Truck 1
2 Boat 3

I got 'Index', 'Class', 'Count' This I can get updated every 5 minutes or I can add this to a list containing 'TimeStamp','Dataframe', where the dataframe looks like above.

The output in a chart, that I would like to have, is one line per class in different colors, showing how many they are in the dataframe every 5 minutes.

How can I do this with pandas and matplotlib in python? I attach my junk code below just to show what I have been using as starting point...

support is most appriciated

def CreateStats():
print("Reading from file")
fo  = open("/home/User/Temp/test_data.txt", "r")    
df = pd.DataFrame(columns=['time', 'class', 'conf'])
ndf = pd.DataFrame(columns=['class', 'class count'])
pos = 0
nPos=0
for t in range(1):
    fo.seek(0, 0)
    for line in fo:            
        #print(str(datetime.now())+" - " + line)
        #time.sleep(1)        
        splitted = line.split(";")
        df.loc[pos] = [datetime.now().strftime("%Y-%m-%d %H:%M:%S"),splitted[0],right(splitted[1],1)]        
        pos=pos+1
    #time.sleep(1)
    df['time'] = pd.to_datetime(df['time'])
    ndf = df.groupby('class').agg({'class':['count']}).reset_index()
    #ndf = df.groupby('class').count().reset_index()
    #ndf = df.groupby('class').agg('count').reset_index()
                    
    #print(df.head())
    #newDf = [datetime.now(),ndf]
    print(ndf)
    #ndf.plot.scatter(x='class', y='time count')
    #plt.show()
    
fo.close()
Magnus_G
  • 49
  • 8
  • Have you tried using `matplotlib.pyplot.plot`? Also, using a simple `dict` mapping classes to counts might be easier than a dataframe. – Stef Sep 07 '21 at 15:05
  • I use matplotlib (I think) or perhaps it in fact is pandas plot? I will try to see if the plot is easier with matplotlib. But the dataframe to be or not. I dont know how the create the data structure to get the incremental values and plot each class. – Magnus_G Sep 07 '21 at 15:18
  • This is an example of what I mean [link](https://1drv.ms/u/s!AuUFDpYEp5kAvW54JOsvwqLIXFwC?e=FvtUoV) – Magnus_G Sep 07 '21 at 15:28
  • Its very similar to this: https://stackoverflow.com/questions/66934662/converting-nested-dictionary-to-pandas-dataframe-and-plotting/69130171?noredirect=1#comment122182866_69130171 But I dont get it – Magnus_G Sep 10 '21 at 11:00
  • Now I get this as aggregated dataframe. So every column 0,1,2,3,4,5 etc are time stamps { 0 1 2 3 4, Bubbles 10 10 10 10 10, Undefined 10 10 10 10 10, Melt Defects 5 5 5 5 5} – Magnus_G Sep 10 '21 at 11:23

1 Answers1

1

I found a way. not the python way perhaps:

def CreateStats():
    print("Reading from file")    
    aggDict = {}
    fo  = open("/home/user/Temp/test_data.txt", "r") 
    for t in range(20):        
        fo.seek(0, 0)
        aggDict[t] = defaultdict(int)
        for line in fo:
            #print(str(datetime.now())+" - " + line)
            defect = line.split(";")
            aggDict[t][defect[0]] += 1           
            if t > 0:
                for key in aggDict[t]:
                    aggDict[t][key] += aggDict[t-1][key]
    print(aggDict)
    df = pd.DataFrame(aggDict)
    df2 = df.transpose()
    lines = df2.plot.line()
    plt.show()

{
0: defaultdict(<class 'int'>, { 'Bubbles': 2, 'Rabbits': 2, 'Cup': 1}),
1: defaultdict(<class 'int'>, {'Bubbles': 12, 'Rabbits': 10, 'Cup': 2}),
2: defaultdict(<class 'int'>, {'Bubbles': 62, 'Rabbits': 42, 'Cup': 3})
}

The file that is used contains a 2 column semi-colon ; separated list of a type and a value. Not using the value in this code...

Magnus_G
  • 49
  • 8