Other suggestion to string replace method? Python

Question

First, take a look at my code below.

import string

DNA=["Alpha", "Bravo", "Charlie", "Delta", "Echo", "CharlieChoo", "DeltaAir", "Alpha bet", "ChooChoo", "Airline"]

body = "{\"startDate\":\"2016-01-01\"\
,\"endDate\":\"2017-10-30\"\
,\"timeUnit\":\"date\"\
,\"keywordGroups\":[{\"groupName\":\"Alpha\",\"keywords\":[\"Alpha\"]}\
,{\"groupName\":\"Bravo\",\"keywords\":[\"Bravo\"]}\
,{\"groupName\":\"Charlie\",\"keywords\":[\"Charlie\"]}\
,{\"groupName\":\"Delta\",\"keywords\":[\"Delta\"]}\
,{\"groupName\":\"Echo\",\"keywords\":[\"Echo\"]}]\
,\"device\":\"\",\"ages\":[\"1\",\"11\"],\"gender\":\"\"}"

body = body.replace(DNA[0],DNA[5],2)
body = body.replace(DNA[1],DNA[6],2)
body = body.replace(DNA[2],DNA[7],2)
body = body.replace(DNA[3],DNA[8],2)
body = body.replace(DNA[4],DNA[9],2)

body

and the output is below

'{"startDate":"2016-01-01","endDate":"2017-10-30","timeUnit":"date","keywordGroups":
[{"groupName":"Alpha betChoo","keywords":["Alpha betChoo"]},
{"groupName":"ChooChooAir","keywords":["ChooChooAir"]},
{"groupName":"Charlie","keywords":["Charlie"]}, 
{"groupName":"Delta","keywords":["Delta"]},
{"groupName":"Airline","keywords":["Airline"]}],"device":"","ages":
["1","11"],"gender":""}'

My intended output is below

#body = "{\"startDate\":\"2016-01-01\"\
#,\"endDate\":\"2017-10-30\"\
#,\"timeUnit\":\"date\"\
#,\"keywordGroups\":[{\"groupName\":\"CharlieChoo\",\"keywords\":[\"CharlieChoo\"]}\
#,{\"groupName\":\"DeltaAir\",\"keywords\":[\"DeltaAir\"]}\
#,{\"groupName\":\"Alpha bet\",\"keywords\":[\"Alpha bet\"]}\
#,{\"groupName\":\"ChooChoo\",\"keywords\":[\"ChooChoo\"]}\
#,{\"groupName\":\"Airline\",\"keywords\":[\"Airline\"]}]\
#,\"device\":\"\",\"ages\":[\"1\",\"11\"],\"gender\":\"\"}"

So basically I was trying to replace groupName and keywords from DNA list. In this example I only have 10 obj in DNA list, but my real projects contains couple thousands.

My personal thought is that replacing strings are not appropriate because the strings is likely to be overlapping. Is there another way to do my task? One thing to consider is that I need to have my output as same type of first body string (only the words are changed). Thanks in advance

--------------------------------------EDIT---------------------------------------------------------------

New error occured regarding @AJAX1234 answer.

import pandas as pd
import json
#reading xlsx file
ex = pd.ExcelFile('mat_hierarchy.xlsx').parse('Sheet1')
DNA = ex.loc[:,'4Level']
DNA

Above is my DNA files and below is output

0          Fruit
1          MixFruit
2          SuperFruit
3          PassionFruit
4          Orange
5          Lemon
6          Mango
................. it goes on forever :(

Using this information, I ran your code and "name a is not defined" error is keep showing. I am only beginner but my best guess is that my "DNA" is defined as indexes (DNA.index[0] or etc..) and I have changed your code "a" with numbers, and it still wont work.

Any suggestion regarding this problem? Thanks for the input!!!

------------------------EDIT 2-------------------------------

body_intro = "{\"startDate\":\"2016-01-01\",\"endDate\":\"2017-10-30\",\"timeUnit\":\"date\",\"keywordGroups\":[{\"groupName\":\""
body_keywords = "\",\"keywords\":[\""
body_groupName = "\"]},{\"groupName\":\""
body_last = "\"]}],\"device\":\"\",\"ages\":[\"1\",\"2\",\"3\",\"4\",\"5\",\"6\",\"7\",\"8\",\"9\",\"10\",\"11\"],\"gender\":\"f\"}"


for i in range(0,len(DNA),5):
    if((len(DNA)%5==0) or (i < (len(DNA)-(len(DNA)%5)))):
    body = body_intro + DNA[i] + body_keywords + DNA[i] + body_groupName + DNA[i+1] + body_keywords + DNA[i+1] + body_groupName + DNA[i+2] + body_keywords + DNA[i+2] + body_groupName + DNA[i+3] + body_keywords + DNA[i+3] + body_groupName + DNA[i+4] + body_keywords + DNA[i+4] + body_last    
    elif(len(DNA)%5==4):
    body = body_intro + DNA[i] + body_keywords + DNA[i] + body_groupName + DNA[i+1] + body_keywords + DNA[i+1] + body_groupName + DNA[i+2] + body_keywords + DNA[i+2] + body_groupName + DNA[i+3] + body_keywords + DNA[i+3] + body_last    
    elif(len(DNA)%5==3):
    body = body_intro + DNA[i] + body_keywords + DNA[i] + body_groupName + DNA[i+1] + body_keywords + DNA[i+1] + body_groupName + DNA[i+2] + body_keywords + DNA[i+2] + body_last    
    elif(len(DNA)%5==2):
    body = body_intro + DNA[i] + body_keywords + DNA[i] + body_groupName + DNA[i+1] + body_keywords + DNA[i+1] + body_last    
    else:
    body = body_intro + DNA[i] + body_keywords + DNA[i] + body_last

Personally, I'd do something with regular expressions in order to get simultaneous replacement, like [this answer](https://stackoverflow.com/a/6117124/2364363). — uber5001, Nov 28 '17 at 04:30
Is your DNA list containing keywords with this form ['t1', 't2', 't3', 't4', 't5', 't6', 't7', 's1', 's2', 's3', 's4', 's5', 's6', 's7'] ? The number of keyword t is same as number of s. If so, please try my answer below. — chx3, Nov 29 '17 at 02:08

score 2 · Accepted Answer · answered Nov 28 '17 at 03:30

You can try this:

import json
new_body = json.loads(body)
DNA=["Alpha", "Bravo", "Charlie", "Delta", "Echo", "CharlieChoo", "DeltaAir", "Alpha bet", "ChooChoo", "Airline"]
new_body['keywordGroups'] = [{c:[DNA[DNA.index(a)+5] for a in d] if isinstance(d, list) else DNA[DNA.index(a)+5] for c, d in i.items()} for i in new_body['keywordGroups']]
final_data = json.dumps(new_body)

Output:

'{"startDate": "2016-01-01", "endDate": "2017-10-30", "gender": "", 
 "ages": ["1", "11"], "keywordGroups": 
  [{"keywords": ["CharlieChoo"], "groupName": "CharlieChoo"}, 
   {"keywords": ["DeltaAir"], "groupName":"DeltaAir"}, 
   {"keywords": ["Alpha bet"], "groupName": "Alpha bet"}, 
 {"keywords": ["ChooChoo"], "groupName": "ChooChoo"}, {"keywords":["Airline"], "groupName": "Airline"}], "device": "", "timeUnit": "date"}'

Can you see my edit on your answer? I have some errors and hoping you to take a look. Thanks again for the response! — EJ Kang, Nov 28 '17 at 07:53
@kang `DNA` is a pandas object, so it probably does not support `.index`. Has `body` remained the same in your recent edit? — Ajax1234, Nov 28 '17 at 15:02

score 0 · Answer 2 · answered Nov 28 '17 at 04:02

Simply use regex. I'm assuming your DNA list contains couples with a target name and a source name.

import re
length_of_DNA = len(DNA) 
for i, t in enumerate(DNA[:length_of_DNA/2]):
    s = DNA[length_of_DNA/2+i]
    body = re.sub(r'\"'+t+'\"', s, body, 2)

Hope that help.

score 0 · Answer 3 · answered Nov 28 '17 at 05:16

To be able to perform a "batch" replacement (and assuming you need to keep the count of elements replaced) I would do the following:

lookup = {"Alpha": "CharlieChoo",
          "Bravo": "DeltaAir",
          "Charlie": "Alpha bet",
          "Delta": "ChooChoo",
          "Echo": "Airline"}

lookup_count = {"Alpha": 2,
                "Bravo": 2,
                "Charlie": 2,
                "Delta": 2,
                "Echo": 2}

def replace_using_lookups(match):
    word = match.group(1)
    if word in lookup and lookup_count[word] > 0:
        lookup_count[word] -= 1
        return '"{}"'.format(lookup[word])
    return '"{}"'.format(word)


re.sub('"(\w+)"', replace_using_lookups, body)

If the lookup_count dict isn't necessary you could perform the replacement using a simpler lambda.

Other suggestion to string replace method? Python

3 Answers3