1

On this subject, I have looked at various examples here on stackoverflow but non has worked for me.

My case is of two dataframes(STUDENTS MARKS). I should be working out the average of the two and give back the result. It works well when I remove the columns with names and other students details and crashes when they are included.

This is part of what I have.

elif self.exam_combo.currentText()=="2":
        df2 = QFileDialog.getOpenFileName(MainWindow, 'Upload marks', os.getenv('HOME'), 'CSV(*.csv)')
        path = df2[0]
        df3 = pd.read_csv(path)
        QMessageBox.information(MainWindow,"Successfull","Choose the last set of marks to upload.")
        df4 = QFileDialog.getOpenFileName(MainWindow, 'Upload marks', os.getenv('HOME'), 'CSV(*.csv)')
        path = df4[0]
        df5 = pd.read_csv(path)

        dfs = [df3, df5]
        df = pd.DataFrame(np.array([x.to_numpy() for x in dfs]).mean(axis=0), index=df3.index, columns=df3.columns)

It gives an error.

Traceback (most recent call last):
  File "D:\Python\PyQt5\Proper_1.py", line 1557, in upload_marks
df = pd.DataFrame(np.array([x.to_numpy() for x in dfs]).mean(axis=0), index=df3.index, columns=df3.columns)
  File "C:\Users\Links Net\AppData\Local\Programs\Python\Python38-32\lib\site-packages\numpy\core\_methods.py", line 153, in _mean
    ret = um.true_divide(
TypeError: unsupported operand type(s) for /: 'str' and 'int'

I think this comes due to the mixup of strings and integers for the system to average. Anybody to help out. I've also tried

df_concat.groupby(level=0).mean()

and https://stackoverflow.com/a/43878488/13399550

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Ptar
  • 168
  • 8
  • 1
    What is `print (df3.dtypes)` and `print (df5.dtypes)` ? It seems some column is not numeric – jezrael May 08 '20 at 10:53
  • STREAM object ADM int64 NAME object KCPE int64 ENG int64 KIS int64 dtype: object STREAM object ADM int64 NAME object KCPE int64 ENG int64 KIS int64 dtype: object – Ptar May 08 '20 at 11:27
  • 1
    so it means `STREAM` and `NAME` columns are object, strings, so how is possible mean of them? Do you need remove this columns? Or processing differently? – jezrael May 08 '20 at 11:30
  • Yeah, some columns are not numeric. I only need an average of the columns with students scores, the rest are names,rollno,stream, etc which are string values – Ptar May 08 '20 at 11:31
  • How do I remove? How do I process differently? – Ptar May 08 '20 at 11:31
  • e.g. by drop. [link](https://stackoverflow.com/questions/13411544/delete-column-from-pandas-dataframe) – jezrael May 08 '20 at 11:32
  • Not working. Maybe I'm doing some unnecessaries – Ptar May 08 '20 at 11:54
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/213420/discussion-between-ptar-and-jezrael). – Ptar May 08 '20 at 13:34

2 Answers2

1

Use:

dfs = [df3, df5]
#select only numeric columns
dfs = [x.select_dtypes(np.number) for x in dfs]
#join together with mean per index
df = pd.concat(dfs).mean(axis=0)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

I got it working with

dfs=pd.concat([df3,df5]).groupby(["STREAM", "ADM", "NAME", "KCPE" ]). mean() 
Ptar
  • 168
  • 8