2

I am having data frame as follows:

Date        Description Amount
12/01/2019  ABC         51,026
13/01/2019  XYZ         12,111.56

I want to add a single name field to the whole data frame having 3 columns "Date Description and Amount" and want to get the output as follows:

 Name: Shivam
 Date        Description Amount
 12/01/2019  ABC         51,026
 13/01/2019  XYZ         12,111.56

I have tried adding a dictionary of name to one data frame and "Date Description and Amount" to another Data Frame and merged both. But it is creating another column of name along with "Date Description and Amount". But I want name to be only one time and not the whole column.

Please suggest some ways to do this.

Shivam
  • 71
  • 3
  • Does this answer your question? [Adding meta-information/metadata to pandas DataFrame](https://stackoverflow.com/questions/14688306/adding-meta-information-metadata-to-pandas-dataframe) – Tobi208 Apr 16 '22 at 17:50
  • It is similar to my question but the suggested methods are not working in my case. – Shivam Apr 16 '22 at 17:59

2 Answers2

1

You can use the metadata to add raw attributes.

import pandas as pd

class metadatadf(pd.DataFrame):

    _metadata = ['metadata']

    @property
    def _constructor(self):
        return metadatadf

data = {"Date": ['12/01/2019', " 13/01/2019"], "Description": ['ABC', 'XYZ'] ,"Amount":['51,026','12,111.56']}
df = metadatadf(data)
df.metadata = " Name:Shivam"
print(df.head().metadata)
print(df)

The output:

 Name:Shivam
          Data Description     Amount
0   12/01/2019         ABC     51,026
1   13/01/2019         XYZ  12,111.56

Also, you can use the MultiIndex, but if you need to save it to a file, it will show duplicated columns

col=pd.MultiIndex.from_product([["Shivam"], ['Date', 'Description', 'Amount']], names=['Name:',''])
df = pd.DataFrame(data=[['12/01/2019', 'ABC', "51,026"], ['13/01/2019', 'XYZ', "12,111.56"]],columns=col)
print(df)

The output:

 Name:Shivam
          Data Description     Amount
0   12/01/2019         ABC     51,026
1   13/01/2019         XYZ  12,111.56

So, according to this answer, you need to use the HDF5 to save the metadata

import numpy as np
import pandas as pd

def h5store(filename, df, **kwargs):
    store = pd.HDFStore(filename)
    store.put('mydata', df)
    store.get_storer('mydata').attrs.metadata = kwargs
    store.close()

def h5load(store):
    data = store['mydata']
    metadata = store.get_storer('mydata').attrs.metadata
    return data, metadata

a = pd.DataFrame(
    data=[['12/01/2019', 'ABC', "51,026"], ['13/01/2019', 'XYZ', "12,111.56"]], columns=['Date', 'Description', 'Amount'])

filename = 'fs.h5'
metadata = dict(Name='Shivam')
h5store(filename, a, **metadata)
with pd.HDFStore(filename) as store:
    data, metadata = h5load(store)
print(data)
print(metadata)
I_Al-thamary
  • 3,385
  • 2
  • 24
  • 37
  • The name is not visible while looking at the CSV file. I want metadata as well to be visible in one column. – Shivam Apr 17 '22 at 03:59
  • See how you can [write the metadat and retrieve it](https://www.geeksforgeeks.org/how-to-add-metadata-to-a-dataframe-or-series-with-pandas-in-python/)). – I_Al-thamary Apr 17 '22 at 07:46
0

I believe this is one way to do what your question describes:

import pandas as pd
mi = pd.MultiIndex.from_product([['Name: Shivam'], 'Date,Description,Amount'.split(',')])
print(mi)
df = pd.DataFrame([['12/01/2019', 'ABC', 51026], ['13/01/2019', 'XYZ', 12111.56]], columns=mi)
print(df)

Output:

MultiIndex([('Name: Shivam',        'Date'),
            ('Name: Shivam', 'Description'),
            ('Name: Shivam',      'Amount')],
           )
  Name: Shivam
          Date Description    Amount
0   12/01/2019         ABC  51026.00
1   13/01/2019         XYZ  12111.56

UPDATE:

You can also use a regular index for the columns (not multiindex) with a name stored in the index. It should display the name one time only:

import pandas as pd
idx = pd.Index('Date,Description,Amount'.split(','), name='Name: Shivam')
df = pd.DataFrame([['12/01/2019', 'ABC', 51026], ['13/01/2019', 'XYZ', 12111.56]], columns=idx)
print(df)

Output:

Name: Shivam        Date Description    Amount
0             12/01/2019         ABC  51026.00
1             13/01/2019         XYZ  12111.56
constantstranger
  • 9,176
  • 2
  • 5
  • 19
  • It is working but it is showing names in all 3 columns. I want in only one column. – Shivam Apr 17 '22 at 03:57
  • I have updated the answer to also show how to use a single-level index for columns with a `name` (in this case "Name: Shivam"). Maybe this will work better for you. – constantstranger Apr 18 '22 at 03:35