I have a pandas dataframe grouped by certain columns. Now I want to insert the mean of the numeric values of four adjacent columns into a new column. This is what I did:
df = pd.read_csv(filename)
# in this line I extract a unique ID from the filename
id = re.search('(\w\w\w)', filename).group(1)
Files look like this:
col1 | col2 | col3
-----------------------
str1a | str1b | float1
My idea was now the following:
# get the numeric values
df2 = pd.DataFrame(df.groupby(['col1', 'col2']).mean()['col3'].T
# insert the id into a new column
df2.insert(0, 'ID', id)
Now loop over all
for j in range(len(df2.values)):
for k in df['col1'].unique():
df2.insert(j+5, (k, 'mean'), df2.values[j])
df2.to_excel('text.xlsx')
But I get the following error, referring to the line with df.insert:
TypeError: not all arguments converted during string formatting
and
if not allow_duplicates and item in self.items:
# Should this be a different kind of error??
raise ValueError('cannot insert %s, already exists' % item)
I am not sure what string formatting refers to here, since I have only numerical values being passed around.
The final output should have all values from col3 in a single row (indexed by id) and every fifth column should be the inserted mean value of the four preceding values.