2

I have a CSV file containing some JSON strings and I want to parse it out and store into dataframe. The file looks like:

file1,"{\"A1\": {\"a\": \"123\"}, \"B1\": {\"b1\": \"456\", \"b2\": \"789\", \"b3\": \"000\"}}",
file2,"{\"A2\": {\"a\": \"321\"}, \"B2\": {\"b1\": \"654\", \"b2\": \"987\"}}"

After get the key in the dictionary. The dateframe I want will be:

 1              2                                    3           
file1   {"A1":{"a":"123"}}    {"B1":{"b1":"456","b2":"789","b3":"000"}}  
file2   {"A2":{"a2":"321"}}   {"B2":{"b1":"654","b2":"987"}}

The value in column 2 and columns 3 will be dictionary. I have tried:

pd.read_csv(file, quotechar='"', header=None) 

but it still separates my JSON in the wrong way...
Any suggestions? Many thanks!

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
jimmy15923
  • 291
  • 1
  • 5
  • 19
  • The formatting of your question isn't really clear, but how far did you get with this already? Parsing csv is easy – OneCricketeer Mar 11 '17 at 15:15
  • 1
    If I used pd.read_csv(file), the third comma will separate the dictionary of "B1" into three parts. And the last part will be like "b3":"000"}} , which is not a json form. – jimmy15923 Mar 11 '17 at 15:21
  • Your json is quoted. You need to set the quote char, otherwise all commas are split, yes. – OneCricketeer Mar 11 '17 at 15:33
  • I have tried pd.read_csv(file,quotechar='"',header=None) It still separate my json in wrong way... – jimmy15923 Mar 11 '17 at 16:05
  • I don't have column in this file so I don't know how to use converter in pd.read_csv() ... – jimmy15923 Mar 11 '17 at 16:08

1 Answers1

6

The data you have is using \" to escape a double quote within each cell. This behaviour can be specified by setting both doublequote=True and escapechar='\\' as parameters as follows:

df = pd.read_csv('input.json', doublequote=True, escapechar='\\')
print df

Giving you something like:

       0                                                  1     2
0  file1  {"A1": {"a": "123"}, "B1": {"b1": "456", "b2":...      
1  file2  {"A2": {"a": "321"}, "B2": {"b1": "654", "b2":...  None
   file1 {"A1": {"a": "123"}, "B1": {"b1": "456", "b2": "789", "b3": "000"}}  \
0  file2  {"A2": {"a": "321"}, "B2": {"b1": "654", "b2":...                    

   Unnamed: 2  
0         NaN
Martin Evans
  • 45,791
  • 17
  • 81
  • 97