-1

I have a pandas dataframe grouped by certain columns. Now I want to insert the mean of the numeric values of four adjacent columns into a new column. This is what I did:

df = pd.read_csv(filename)
# in this line I extract a unique ID from the filename
id = re.search('(\w\w\w)', filename).group(1)

Files look like this:

col1   | col2  | col3
-----------------------
str1a  | str1b | float1

My idea was now the following:

# get the numeric values
df2 = pd.DataFrame(df.groupby(['col1', 'col2']).mean()['col3'].T
# insert the id into a new column
df2.insert(0, 'ID', id)

Now loop over all

for j in range(len(df2.values)):
    for k in df['col1'].unique():
        df2.insert(j+5, (k, 'mean'), df2.values[j])

df2.to_excel('text.xlsx')

But I get the following error, referring to the line with df.insert:

TypeError: not all arguments converted during string formatting

and

if not allow_duplicates and item in self.items:
    # Should this be a different kind of error??
    raise ValueError('cannot insert %s, already exists' % item)

I am not sure what string formatting refers to here, since I have only numerical values being passed around.

The final output should have all values from col3 in a single row (indexed by id) and every fifth column should be the inserted mean value of the four preceding values.

1 Answers1

0

If I had to work with files like yours I code a function to convert to csv... something like that:

data = []
for lineInFile in file.read().splitlines():
    lineInFile_splited = lineInFile.split('|')
    if len(lineInFile_splited)>1: ## get only data and not '-------'
        data.append(lineInFile_splited)
df = pandas.DataFrame(data, columns = ['A','B'])

Hope it helps!

Ika8
  • 391
  • 1
  • 12
  • I think its actually the other way around. I have many files from which I want to extract one column and put these columns altogether into one dataframe. Then I want to insert mean values at specific points. – Fabian Moss May 19 '17 at 10:59