-1

I have a csv file with columns like this

enter image description here

I need to separate column (B) values into separate columns and multiple rows like this enter image description here

This is what I tried (the data here in below code is same as csv data above) and did not work

data = [{"latlong":'{lat: 15.85173248  ,  lng: 78.6216129},{lat: 15.85161765  ,  lng: 78.61982138},{lat: 15.85246304  ,  lng: 78.62031075},{lat: 15.85250474  ,  lng: 78.62034441},{lat: 15.85221891  ,  lng: 78.62174507},', "Id": 1},
        {"latlong": '{lat: 15.8523723  ,  lng: 78.62177758},{lat: 15.85236637  ,  lng: 78.62179098},{lat: 15.85231281  ,  lng: 78.62238316},{lat: 15.8501259  ,  lng: 78.62201676},', "Id":2}]
df = pd.DataFrame(data)
df
df.latlong.apply(pd.Series)

This works in this case

data1 = [{'latlong':[15.85173248, 78.6216129, 1]},{'latlong': [15.85161765, 78.61982138, 1]},{'latlong': [15.85246304, 78.62031075, 1]},
    {'latlong': [15.85250474, 78.62034441, 1]}, {'latlong': [15.85221891, 78.62174507, 1]},{'latlong': [15.8523723, 78.62177758, 2]},
    {'latlong': [15.85236637, 78.62179098, 2]}, {'latlong': [15.85231281, 78.62238316, 2]},{'latlong': [15.8501259,78.62201676, 2]}]
df1 = pd.DataFrame(data1)
df1

df1 = df1['latlong'].apply(pd.Series)
df1.columns = ['lat', 'long', 'Id']
df1

How can I achieve this with Python ?

New to python. I tried following links... Could not understand how to apply it to my case. Splitting dictionary/list inside a Pandas Column into Separate Columns

python split data frame columns into multiple rows

Community
  • 1
  • 1
pavan
  • 19
  • 6
  • the first link you have provided has the answer to your question, can you be more specific about what you don't understand? – maxymoo May 16 '17 at 05:33
  • @maxymoo: Here in my example case, each single cell (csv/ excel) has multiple pairs of lat, lngs and each pair of lat, lng is enclosed in {}. No clue, how do I separate each pair of lat, lng into to separate columns named lat, long. Thanks – pavan May 19 '17 at 12:25
  • `df.latlong.apply(pd.Series)` – maxymoo May 20 '17 at 23:17
  • @ maxymoo: Thanks for your response. I have tried this before posting my problem. This didn’t work in my case. Also, edited my problem with the code what I tried .... where it worked & it didn't. Thanks – pavan May 23 '17 at 11:10

1 Answers1

0

Your data is in a very strange format ... the entries of latlong aren't actually valid JSON (there is a trailing comma at the end, and there are no quotes around the field names), so I would probably actually use a regular expression to split out the columns, and a list comprehension to split out the rows:

In [39]: pd.DataFrame(
             [{'Id':r['Id'], 'lat':lat, 'long':long}  
                for r in data 
                for lat,long in re.findall("lat: ([\d.]+).*?lng: ([\d.]+)",
                                           r['latlong'])])
Out[39]:
   Id          lat         long
0   1  15.85173248   78.6216129
1   1  15.85161765  78.61982138
2   1  15.85246304  78.62031075
3   1  15.85250474  78.62034441
4   1  15.85221891  78.62174507
5   2   15.8523723  78.62177758
6   2  15.85236637  78.62179098
7   2  15.85231281  78.62238316
8   2   15.8501259  78.62201676
maxymoo
  • 35,286
  • 11
  • 92
  • 119
  • @ maxymoo: Thanks. It gives what I want. Being a beginner the solution looks complex for me. I need to learn how the code is working. – pavan May 26 '17 at 06:45
  • read about list comprehension and regular expressions, you'll be able to work it out – maxymoo May 28 '17 at 23:33