I want to extract values of cx,cy,r which all reside in column 3

Question

Below is my sample data in CSV file

filename, file_size, region_shape_attributes
1.jpg, 2551045, {"name":"circle","cx":371,"cy":2921,"r":73}
2.jpg, 2551045, {"name":"circle","cx":505,"cy":2951,"r":62}
3.jpg, 2551045, {"name":"circle","cx":619,"cy":2865,"r":83}
4.jpg, 2551045, {"name":"circle","cx":769,"cy":2793,"r":82}
5.jpg, 2551045, {"name":"circle","cx":885,"cy":2669,"r":87}

I want output as follow:

name   cx  cy   r
circle 371 2921 73
circle 371 2921 73
circle 371 2921 73

great, what have you tried, also why is your output just repeating — gold_cy, Feb 14 '19 at 12:09

score 0 · Answer 1 · answered Feb 14 '19 at 12:22

import ast

# read your data
d = pd.read_clipboard()

# transform string to dictionary
d["region_shape_attributes"] = d["region_shape_attributes"].apply(lambda x: ast.literal_eval(x))

# convert column of dictionary to dataframe
pd.DataFrame(list(d['region_shape_attributes']))

It gives you the result.

    cx  cy      name    r
0   371 2921    circle  73
1   505 2951    circle  62
2   619 2865    circle  83
3   769 2793    circle  82
4   885 2669    circle  87

score 0 · Answer 2 · answered Feb 14 '19 at 12:30

Read CSV file in a Dataframe:

df=pd.DataFrame({'img':['1.jpg','2.jpg','3jpg','4.jpg','5.jpg'],'id':[2551045,2551045,2551045,2551045,2551045],'dict':[{"name":"circle","cx":371,"cy":2921,"r":73},
                                       {"name":"circle","cx":505,"cy":2951,"r":62},
                                      {"name":"circle","cx":619,"cy":2865,"r":83},
                                      {"name":"circle","cx":769,"cy":2793,"r":82},
                                      {"name":"circle","cx":885,"cy":2669,"r":87}]})

use .apply(pd.Series)

df['dict'].apply(pd.Series)

Output:

   cx   cy      name    r
0   371 2921    circle  73
1   505 2951    circle  62
2   619 2865    circle  83
3   769 2793    circle  82
4   885 2669    circle  87

score 0 · Answer 3 · answered Feb 14 '19 at 12:37

Old School way (without any package/module):

list.txt:

filename, file_size, region_shape_attributes
1.jpg, 2551045, {"name":"circle","cx":371,"cy":2921,"r":73}
2.jpg, 2551045, {"name":"circle","cx":505,"cy":2951,"r":62}
3.jpg, 2551045, {"name":"circle","cx":619,"cy":2865,"r":83}
4.jpg, 2551045, {"name":"circle","cx":769,"cy":2793,"r":82}
5.jpg, 2551045, {"name":"circle","cx":885,"cy":2669,"r":87}

and then:

logFile = "list.txt"

with open(logFile) as f:
    content = f.readlines()

# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]


dict_list = []
for line in content[1:]:
    l = line.split("{", 1)[1].strip("}")
    dict_list.append(l)

print("name \t", end="")
print("cx \t\t", end="")
print("cy \t\t", end="")
print("r \t", )
for elem in dict_list:
    x = elem.split(",")
    print(x[0].split(":", 2)[1].replace('"', " "), end = "")    
    print(x[1].split(":", 2)[1].replace('"', " "), "\t", end = "")        
    print(x[2].split(":", 2)[1].replace('"', " "), "\t", end = "")    
    print(x[3].split(":", 2)[1].replace('"', " "), "\t")

OUTPUT:

name    cx      cy      r   
 circle 371     2921    73  
 circle 505     2951    62  
 circle 619     2865    83  
 circle 769     2793    82  
 circle 885     2669    87

score 0 · Answer 4 · answered Feb 14 '19 at 13:05

use below code :

csv_data=pd.read_csv(<file path>,sep=' ')
csv_data.columns=['Field1','Field2','Field3']
name=[]
cx=[]
cy=[]
r=[]
for i in csv_data['Field3']:
    list_i=i.split(',')
    name.append(list_i[0].split(':')[1])
    cx.append(list_i[1].split(':')[1])
    cy.append(list_i[2].split(':')[1])
    r.append(list_i[3].split(':')[1].replace('}',''))
df_result=pd.DataFrame({'name':name,'cx':cx,'cy':cy,'r':r})
print (df_result)

output based on input given above: cx cy name r 0 371 2921 "circle" 73 1 505 2951 "circle" 62 2 619 2865 "circle" 83 3 769 2793 "circle" 82 4 885 2669 "circle" 87

I want to extract values of cx,cy,r which all reside in column 3

4 Answers4