Insert data into grouped DataFrame (pandas)

Question

I have a pandas dataframe grouped by certain columns. Now I want to insert the mean of the numeric values of four adjacent columns into a new column. This is what I did:

df = pd.read_csv(filename)
# in this line I extract a unique ID from the filename
id = re.search('(\w\w\w)', filename).group(1)

Files look like this:

col1   | col2  | col3
-----------------------
str1a  | str1b | float1

My idea was now the following:

# get the numeric values
df2 = pd.DataFrame(df.groupby(['col1', 'col2']).mean()['col3'].T
# insert the id into a new column
df2.insert(0, 'ID', id)

Now loop over all

for j in range(len(df2.values)):
    for k in df['col1'].unique():
        df2.insert(j+5, (k, 'mean'), df2.values[j])

df2.to_excel('text.xlsx')

But I get the following error, referring to the line with df.insert:

TypeError: not all arguments converted during string formatting

and

if not allow_duplicates and item in self.items:
    # Should this be a different kind of error??
    raise ValueError('cannot insert %s, already exists' % item)

I am not sure what string formatting refers to here, since I have only numerical values being passed around.

The final output should have all values from col3 in a single row (indexed by id) and every fifth column should be the inserted mean value of the four preceding values.

Sorry, no. Can you add 5 - 6 rows of data and desired output? The best is if simultae youe error. — jezrael, May 19 '17 at 10:33
Is your question regarding writing to the .xlsx file or doing the transformation? — oLas, May 19 '17 at 10:47
I'll try to input the question you requestet @jezrael but it will take me some time. — Fabian Moss, May 19 '17 at 10:58
@FabianMoss - Thank you. Maybe help [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — jezrael, May 19 '17 at 10:59

score 0 · Answer 1 · answered May 19 '17 at 10:47

0

If I had to work with files like yours I code a function to convert to csv... something like that:

data = []
for lineInFile in file.read().splitlines():
    lineInFile_splited = lineInFile.split('|')
    if len(lineInFile_splited)>1: ## get only data and not '-------'
        data.append(lineInFile_splited)
df = pandas.DataFrame(data, columns = ['A','B'])

Hope it helps!

answered May 19 '17 at 10:47

Ika8

391
1
12

I think its actually the other way around. I have many files from which I want to extract one column and put these columns altogether into one dataframe. Then I want to insert mean values at specific points. – Fabian Moss May 19 '17 at 10:59

Insert data into grouped DataFrame (pandas)

1 Answers1