-1

Currently im trying to code a program for predictive analytics, but i am having some problems figuring out how to print out the daily median on a dataframe I'm having problems with trying to figure out the daily median of the 'up-time' on a dataframe.

My code looks like this:

import glob
import os
import pandas as pd
import numpy as np
import time
import matplotlib.pyplot as plt

path = r'C:\Users\eneki\OneDrive\001. HHS\.LVNL\Documentatie\MNT-COM\testdata'
filenames = glob.glob(os.path.join(path + '/*.csv'))

li= []
for filename in filenames:
    df = pd.read_csv(filename, index_col=None, header= 0)
    li.append(df)


def daily_mean(df, date, col):
    return df[date][col].mean()

data=np.random.rand(10)
columns = Data['up-time']
times= pd.date_range('14/10/2021', freq='1D', periods = 10)
Data = pd.DataFrame(data=data, index=times, columns=columns)

dates= df.index.strftime('%d%m%Y').unique()
means=df.groupby(pd.Grouper(freq='1D')).mean()

This is what my dataframe looks like:

In [152]: Data.iloc[:, : 8]
Out[152]: 
            Date      Time  ...         OS Version  up-time
0     14/10/2021  00:05:18  ...  7.0.2 17867351 U2  2900565
1     14/10/2021  00:10:19  ...  7.0.2 17867351 U2  2893095
2     14/10/2021  00:20:21  ...  7.0.2 17867351 U2  2901468
3     14/10/2021  00:25:19  ...  7.0.2 17867351 U2  2893995
4     14/10/2021  00:35:18  ...  7.0.2 17867351 U2  2902365
         ...       ...  ...                ...      ...
2414  26/10/2021  11:10:18  ...  7.0.2 17867351 U2   182031
2415  26/10/2021  11:20:17  ...  7.0.2 17867351 U2   182631
2416  26/10/2021  11:25:18  ...  7.0.2 17867351 U2   182931
2417  26/10/2021  11:35:20  ...  7.0.2 17867351 U2   183534
2418  26/10/2021  11:40:20  ...  7.0.2 17867351 U2   183833

[2419 rows x 8 columns]

And the error i get is:

ValueError: Shape of passed values is (10, 1), indices imply (10, 2419)

When i apply the exact numbers i get another error:

AttributeError: 'RangeIndex' object has no attribute 'strftime'

Is it because of a standard format in the date/time, Anyways, I dont really know how to fix it. Anyone familiar with this problem ?

  • Please, avoid [posting images of text](https://unix.meta.stackexchange.com/questions/4086/psa-please-dont-post-images-of-text). It is a better practice to transcribe them instead. – accdias Nov 12 '21 at 13:01
  • 1
    Thank you, I have changed it into a transcription. – Vincent Leung Nov 12 '21 at 13:08
  • Please provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) [pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), otherwise people investing their free time can't help you... (`Data['up-time'], df` are unknown) – Albo Nov 12 '21 at 13:55
  • Thank you Ablo, i have corrected the post to the examples you have given me! – Vincent Leung Nov 12 '21 at 14:24

1 Answers1

0

The following should work.

means = df.groupby("Date")["up-time"].median()

The first error in your code comes from:

columns = Data['up-time']

Where it has a list of 2419 values. It then tries to name the columns based on these values but there is only 1 column, which is visible from the error code.

ValueError: Shape of passed values is (10, 1), indices imply (10, 2419)

I hope this makes sense. :)

Mislav-Ro
  • 1
  • 2
  • Unfortunately after your comments, still no luck. After changing the means variable i still get the same error. I have updated the post a bit. Maybe it will be helpful as i have added some more information. – Vincent Leung Nov 12 '21 at 14:33
  • @VincentLeung Edited the answer, take a look now. – Mislav-Ro Nov 12 '21 at 20:47
  • I have tried the code but now i got another error which is the following: AttributeError: 'RangeIndex' object has no attribute 'strftime' As i am still quite a beginner at Python Dataframes i dont where i have gone wrong. – Vincent Leung Nov 15 '21 at 10:31