0

I'm dealing with messy data and upon import the header of the DF has values I actually need, Can I convert these to rows and perhaps remove the column name or rename them? My end goal here is to convert this to a json from pandas.

df

    Url                     https://www.intuit.com/oidam/intuit/ic/en_us/content/intuit-cr-report-2020-icom.pdf
0   company_id              53
1   current_value scope 1   1912
2   current_value scope 2   5136
3   current_value scope 3   30476 

You can see it's all key values except the header for the first column is Url so when I later do, df.to_json() I get:

{"Url":{"0":"company_id","1":"current_value scope 1","2":"current_value scope 2","3":"current_value scope 3"},"https:\/\/www.intuit.com\/oidam\/intuit\/ic\/en_us\/content\/intuit-cr-report-2020-icom.pdf":{"0":53,"1":1912,"2":5136,"3":30476}}
Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • If you don't want to read the first line as column names, you can use `header=None` in read function. – Psidom Feb 24 '22 at 22:18
  • @Psidom It's actually part of a larger df that I'm parsing. I cannot modify this on import. Is there a way to do this on a existing df? – Lostsoul Feb 24 '22 at 22:23
  • You can insert / append the columns as a new row into the dataframe: https://stackoverflow.com/questions/24284342/insert-a-row-to-pandas-dataframe. And then use `df.columns = [...new names]` to rename columns. – Psidom Feb 24 '22 at 22:27

1 Answers1

1

You could transpose and reset_index twice:

out = df.T.reset_index().T.reset_index(drop=True)

Output:

                          0                                                  1
0                       Url  https://www.intuit.com/oidam/intuit/ic/en_us/c...
1                company_id                                                 53
2     current_value scope 1                                               1912
3     current_value scope 2                                               5136
4     current_value scope 3                                              30476

Then again, if we look at the json, it seems the below line makes more sense:

out = df.T.reset_index().T.set_index(0)

which produces the json string below:

'{"1":{"Url":"https:\\/\\/www.intuit.com\\/oidam\\/intuit\\/ic\\/en_us\\/content\\/intuit-cr-report-2020-icom.pdf","   company_id":53,"   current_value scope 1":1912,"   current_value scope 2":5136,"   current_value scope 3":30476}}'