1

I have a dataframe like this

id            Merchant ID  Date                 App Details
601179aa      1           27/01/21 20:03  
[{:appName "FAU-G", :packageName "com.ncoregames.faug"} 
 {:appName "Truecaller", :packageName "com.truecaller"}]

I want output like this

id           Merchant ID  Date                 App Name  Package Name
601179aa          1          27/01/21 20:03        FAU-G com.ncoreagames.faug
601179aa          1          27/01/21 20:03.   True Caller com.truecaller

I tried

df['App Details'] = df['App Details'].str.replace(r"\[","")
df['App Details'] = df['App Details'].str.replace(r"\]","")
foo = lambda x: pd.Series([i for i in (x.split(' '))])
app_df = df['App Details'].apply(foo)

the number of dictionaries in list keeps on varying.

3 Answers3

2

You can do it using regular expressions:

df=pd.DataFrame({'id': {0: '601179aa'},
 'Merchant': {0: 1},
 'ID': {0: '27/01/21'},
 'Date': {0: '20:03'},
 'App Details': {0: '[{:appName"FAU-G".:packageName"com.ncoreagames.faug"}{:appName"Truecaller",:packageName"com.truecaller"}]'}})

import re

df['App Details']=df['App Details'].str.split("}{")

    
df=df.explode('App Details')

df['App Name']=df['App Details'].apply(lambda x:re.findall('appName"(.+?)"',x)).explode()
df['Package Name']=df['App Details'].apply(lambda x:re.findall('packageName"(.+?)"',x)).explode()


     App Name          Package Name
0       FAU-G  com.ncoreagames.faug
0  Truecaller        com.truecaller
yudhiesh
  • 6,383
  • 3
  • 16
  • 49
Billy Bonaros
  • 1,671
  • 11
  • 18
1

You can start by:

  1. Expanding you cells into multiple rows:
df1 = pd.concat([pd.Series(row['id'], row['App'].split('}{'))        
                  for _, row in df.iterrows()]).reset_index().rename(columns = {'index':'temp1', 0:'id'})
df1['temp1'] = df1['temp1'].str.replace(r'\[{|}\]', '')
df1 

To get this (Notice the change in column names):

temp1   id
0   :appName"FAU-G",:packageName"com.ncoreagames.f...   601179aa
1   :appName"Truecaller",:packageName"com.truecaller"   601179aa
  1. Split the new rows string into multiple columns: Here The code is also getting the names for the columns and renaming them appropriately (assumption is that all your rows start with :appName or :packageName - Other wise adjust the regex inside the findall step)
df2 = df1["temp1"].str.split(",", n= 1 ,expand = True) 
names = []
for col in df2.columns:
    name = re.findall('(?<=\:)(.*?)(?=\")',df2[col][0])[0]
    df2.loc[:,col] = df2[col].str.replace(name, '')
    df2.loc[:,col] = df2[col].str.replace('\"|\:', '')
    names.append(name)
df2.columns = names
df2

You get this:

    appName packageName
0   FAU-G   com.ncoreagames.faug
1   Truecaller  com.truecaller
  1. Putting it all together using the new column names:
finaldf = df[['id', 'Merchant','ID',    'Date', 'Details']].merge(df1[['id']].merge(df2, left_index = True, right_index = True, how = 'inner'), on = 'id')
finaldf


id  Merchant    ID  Date    Details appName packageName
0   601179aa    1   27/01/21    20:03   NaN FAU-G   com.ncoreagames.faug
1   601179aa    1   27/01/21    20:03   NaN Truecaller  com.truecaller
Jorge
  • 2,181
  • 1
  • 19
  • 30
1

Your string looks very similar to json. One approach might be to convert the string to valid json & read into a dict. then explode, and use json_normalize to extract the information, finally setting the index to the correct value.

The one benefit of this approach when compared to the accepted answer is that it is more general. The way this approach might fail is if your structure is not easily convertible to json.

In the example, I've assumed that the keys are all words and there is no , between records, but there may be several spaces (actually non-alphabetical characters).

import json
import pandas as pd

df = pd.DataFrame({
        'id': ['601179aa'],
        'Merchant ID': [1],
        'Date': ['27/01/21 20:03'],
        'App Details': [
            '[{:appName "FAU-G", :packageName "com.ncoreagames.faug"}{:appName "Truecaller", :packageName "com.truecaller"}]'
            ]
})

idx_cols = ['id', 'Merchant ID', 'Date']

df2 = df.set_index(idx_cols)['App Details'] \
  .str.replace(':(\w+)', '"\\1":') \
  .str.replace('}\W*{', '},{') \
  .apply(json.loads).explode()

df3 = pd.json_normalize(df2).set_index(df2.index).reset_index()

df3 outputs:

         id  Merchant ID            Date     appName           packageName
0  601179aa            1  27/01/21 20:03       FAU-G  com.ncoreagames.faug
1  601179aa            1  27/01/21 20:03  Truecaller        com.truecaller
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85