0

I want to do an operation with dataframe in Pandas 23.0 but I can not find the best way to do it.

I pick up from a CSV an id with a time and value and I intend to calculate the mean (mean ()) for each of the rows.

Example:
id time value
1 22:10:01 10
2 22:10:02 20
3 22:10:03 30
2 22:10:04 40
1 22:10:05 50

It would be something like this:
id time value mean
1 22:10:01 10 10
2 22:10:02 20 20
3 22:10:03 30 30
2 22:10:04 40 30 ((40 + 20) / 2)
1 22:10:05 50 30 ((50 + 10) / 2)

Taking into account that the first means would be the value itself.

I have arrived at a solution using an auxiliary dictionary:

dat = pd.read_csv ('file.csv')
dicc = {}

for row in dat.itertuples ():
    ids = row [1]
    values = row [3]
    timestamps = row [2]

    if ids in dicc
        dicc [ids]['id'].append(ids)
        dicc [ids]['value'].append(values)
        dicc [ids]['mean'].append((dicc[ids]['mean'][- 1]+values)/2)
    else:
        dicc [ids] = {
            'sensor_id': [ids],
            'timestamp': [timestamps],
            'mean': [values]

df2 = pd.DataFrame.from_dict(data=dicc)
df2.to_csv('file2.csv')

basically what I do is fill in the dictionary knowing if the id has already appeared or not.

If try to create a new df with the mean column in order to test the timming:

last=len(datos.columns)
df=pd.DataFrame(data=dat, columns=dat.keys()) 
df.insert(loc=last, column='mean', value=None)

but I don't find the way to do that process in a dataframe

Als
  • 95
  • 10

1 Answers1

0

You can use predefined mean function

df.mean(axis=1)
Narendra Prasath
  • 1,501
  • 1
  • 10
  • 20
  • as I see, that can't give me the mean results by id. The axis 0 or 1 make the mean for [columns or rows](https://stackoverflow.com/questions/22149584/what-does-axis-in-pandas-mean) and somethin `like df.groupby(['id'])[value].mean()` wasn't work too – Als May 28 '18 at 15:56