sequentially replace a word using another text file list of words - data_file

Question

in my head I have:

read "original_file", change line 3 "ENTRY1" to be that of the FIRST Word in data_file. write out new_file1. read "original_file", change line 3 "ENTRY1" to be that of the SECOND Word in data_file. write out new_file2

repeat through entire data_file.

excerpt/example:

original_file:

    line1      {
    line2        "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c",
    line3        "name": "ENTRY1",
    line4        "auto": true,
    line5        "contexts": [],
    line6        "responses": [
    line7      {
    ------------

    data_file:(simply a word/number List)
    line1   AAA11
    line2   BBB12
    line3   CCC13
    ..100lines/Words..
    -------------

    *the First output/finished file would look like:
    newfile1:
    line1      {
    line2        "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c",
    line3        "name": "AAA11",
    line4        "auto": true,
    line5        "contexts": [],
    line6        "responses": [
    line7      {
    ------------
    and the Second:
    newfile2:
    line1      {
    line2        "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c",
    line3        "name": "BBB12",
    line4        "auto": true,
    line5        "contexts": [],
    line6        "responses": [
    line7      {
    ------------

..and so on.

I have been trying with sed, something like

awk 'FNR==$n1{if((getline line < "data_file") > 0) fprint '/"id:"/' '/""/' line ; next}$n2' < newfile

and.. as a start of a shell script..

#!/bin/bash
n1=3
n2=2
sed '$n1;$n2 data_file' original_file > newfile

any help would be appreciated.. I've been trying to glue together various techniques found on SO.. one thing at a time.. learning how to replace.. then replace from a second file.. but its above my knowledge. thanks again. I have approximately 31,000 LINES in my data_file.. so this is necessary.. (to be automated). its a one time thing, but may be very useful for others?

Is this a JSON content you are trying to manipulate? Don't use `awk` or regex tools but use a syntax aware parser `jq` — Inian, Jan 31 '19 at 05:07
yes. this is a JSON file. thanks. i've never heard of jq.. just found it on github. :) i'll keep trying. — , Jan 31 '19 at 05:14
FYI: order and json file -> https://stackoverflow.com/questions/16870416/does-the-sequence-of-the-values-matter-in-a-json-object white space and json file -> https://stackoverflow.com/questions/4150621/are-whitespace-characters-insignificant-in-json — Allan, Jan 31 '19 at 06:37

score 1 · Answer 1 · answered Jan 31 '19 at 05:49

1

Assuming we are trying to change 'name' in some JSON data and that the new values will be purely alphanumeric (so that doublequoting works properly):

#!/bin/bash

n=1
cat data_file | while read value; do
    jq <original_file >"newfile$n" ".name = \"$value\""
    ((n++))
done

answered Jan 31 '19 at 05:49

jhnc

11,310
1
9
26

Nice answer using `jq`! +1 – Allan Jan 31 '19 at 06:23
Thanks jhnc, interesting to learn of this jq! Thanks.. i will also try your approach. Cheers. – Jan 31 '19 at 06:30
this works very well too! i am curious as to just how it functions.. if i may for example, it works perfect with .name (variable), but say, if i wanted to change a different line, that reads just like the first but is called "speech": and obviously further down my original_file.. it doesnt chnage that line, instead adds it to the bottom.. i dont see where in either of the Answers here that it is actually reading the Line# of the orignal_file, but neither Answers work if change the "name" .. just a curiosity at this moment. :) thanks for your efficient and fast script! Cheers – Jan 31 '19 at 14:29
Are you sure speech isn't nested inside a structure? For example, somewhere inside responses? – jhnc Jan 31 '19 at 15:37
ah. you're correct. i didn't catch that.. it's in responces [ .. figures. lol – Jan 31 '19 at 17:06
ok i tried, and failed.. is there an easy way to get it to Also read another line "speech": that is within a responce []? these are within the same file. i tried to 'patch' some code together, but it's not working.. at all. :) Thanks if you can/choose to help. Cheers. – Feb 01 '19 at 09:13

Allan · Accepted Answer · 2019-01-31T06:33:58.397

In Python 2.+:

INPUT:

more original_file.json data_file 
::::::::::::::
original_file.json
::::::::::::::
{
  "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c",
  "name": "ENTRY1",
  "auto": true,
  "contexts": [],
  "responses": []
}
::::::::::::::
data_file
::::::::::::::
AAA11
BBB12
CCC13

python script:

import json

#open the original json file
with open('original_file.json') as handle:
  #create a dict based on the json content
  dictdump = json.loads(handle.read())
  #file counter
  i = 1
  #open the data file
  f = open("data_file", "r")
  #get all lines of the data file
  lines = f.read().splitlines()
  #close it
  f.close()
  #for each line of the data file
  for line in lines:
    #change the value of the json name element by the current line
    dictdump['name'] = line
    #open newfileX
    o = open("newfile" + str(i),'w')
    #dump the content of modified json
    json.dump(dictdump,o)
    #close the file
    o.close()
    #increase the counter value
    i += 1

output:

more newfile*
::::::::::::::
newfile1
::::::::::::::
{"contexts": [], "auto": true, "responses": [], "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", "name": "AAA11"}
::::::::::::::
newfile2
::::::::::::::
{"contexts": [], "auto": true, "responses": [], "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", "name": "BBB12"}
::::::::::::::
newfile3
::::::::::::::
{"contexts": [], "auto": true, "responses": [], "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", "name": "CCC13"}

If you need to output the json dump vertically, you can adapt the line: json.dump(dictdump, o) into json.dump(dictdump, o, indent=4). This will produce:

more newfile*
::::::::::::::
newfile1
::::::::::::::
{
    "contexts": [], 
    "auto": true, 
    "responses": [], 
    "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", 
    "name": "AAA11"
}
::::::::::::::
newfile2
::::::::::::::
{
    "contexts": [], 
    "auto": true, 
    "responses": [], 
    "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", 
    "name": "BBB12"
}
::::::::::::::
newfile3
::::::::::::::
{
    "contexts": [], 
    "auto": true, 
    "responses": [], 
    "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", 
    "name": "CCC13"
}

DOC: https://docs.python.org/2/library/json.html

Newer version to keep the same order as the input:

import json
from collections import OrderedDict

#open the original json file
with open('original_file.json') as handle:
  #create a dict based on the json content
  dictdump = json.loads(handle.read(), object_pairs_hook=OrderedDict)
  #file counter
  i = 1
  #open the data file
  f = open("data_file", "r")
  #get all lines of the data file
  lines = f.read().splitlines()
  #close it
  f.close()
  #for each line of the data file
  for line in lines:
    #change the value of the json name element by the current line
    dictdump['name'] = line
    #open newfileX
    o = open("newfile" + str(i),'w')
    #dump the content of modified json
    json.dump(dictdump, o, indent=4)
    #close the file
    o.close()
    #increase the counter value
    i += 1

Output:

more newfile*
::::::::::::::
newfile1
::::::::::::::
{
    "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", 
    "name": "AAA11", 
    "auto": true, 
    "contexts": [], 
    "responses": []
}
::::::::::::::
newfile2
::::::::::::::
{
    "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", 
    "name": "BBB12", 
    "auto": true, 
    "contexts": [], 
    "responses": []
}
::::::::::::::
newfile3
::::::::::::::
{
    "id": "b5902627-0ba0-40b6-8127-834a3ddd6c2c", 
    "name": "CCC13", 
    "auto": true, 
    "contexts": [], 
    "responses": []
}

awesome!, i'll give it a shot right now and let you know, thank you. — , Jan 31 '19 at 05:48
i think that will work! one thing, is how can i get the newfile output to be Vertically, not spread Horizontally .. that may matter in the end. and before i run this awesome script of yours on 31,000 files . . i need to be sure. ;) Thanks!! — , Jan 31 '19 at 06:02
@GarrettKrosschell: I have edited my answer! Just use `json.dump(dictdump, o, indent=4)` — Allan, Jan 31 '19 at 06:07
@GarrettKrosschell This should have no effect at all as it is a json format, but I can edit my answer if you want to keep the order — Allan, Jan 31 '19 at 06:30
@GarrettKrosschell: answer edited, I have replaced the dictionary data structure by an orderedDict -> this will keep the same order as input. — Allan, Jan 31 '19 at 06:34
awesome, Allan. thank you very much.. I'll accept your answer as I know it is working.. and i have some editing/scripting to learn now.. :) ty. Cheers. — , Jan 31 '19 at 06:37
@GarrettKrosschell: Thank you, check also the 2 links I have added. As you can see the best would have to dump the JSON files in a linear way as it saves some space on the machine and has no impact on the interpretation of the data. Also do not hesitate to vote up jhnc's answer if it has helped you. ;-) you can vote answer now since you have reached +15 reputation. Good luck, cheers — Allan, Jan 31 '19 at 06:40

sequentially replace a word using another text file list of words - data_file

2 Answers2