Parsing large string values in Pandas

Question

I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:

{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}

Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:

-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value     | Value     | Value     |   NaN     |   NaN     |
-------------------------------------------------------------
|  NaN      |  Value    | NaN       | Value     |  Value    |
-------------------------------------------------------------

Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns, https://stackoverflow.com/questions/29370211/split-strings-in-tuples-into-columns-in-pandas, https://stackoverflow.com/questions/39553392/text-to-columns-with-comma-delimiter-using-python etc. — Evan, Nov 16 '18 at 06:02
Possible duplicate of [Split strings in tuples into columns, in Pandas](https://stackoverflow.com/questions/29370211/split-strings-in-tuples-into-columns-in-pandas) — Evan, Nov 16 '18 at 06:02
If you know, is the data JSON, or a Python dictionary? What have you tried so far? — Evan, Nov 16 '18 at 06:03
The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format — Danny, Nov 17 '18 at 20:26

score 0 · Answer 1 · answered Nov 16 '18 at 06:12

Dataframes follow this format when converted from a dictionery:

dict = {'column 1':[1,2], 'column 2':[3,4], ...}

Notice that the length of values in each key is same or

pd.DataFrame(dict)

will throw an error.

To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.

pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))

*Assuming 'dict' is your dictionery name.

This way you'll have the desired output.

Parsing large string values in Pandas

1 Answers1