2

I have a string (a log line, actually, containing sensitive informations (info) ) and I want to replace a substring within it, based on the index of the substring within the string. The substring can have multiple words within it, but as per the requirement it must be considered as a single substring.

Details:

So, here's my string :

[2016-04-25 03:48:34] 123737 error 150531221446 2000 Master dmart 843212 "Tough times"

Here we need to replace the word "Tough times", with some string, say, "Human race". Now following is the manner in which the string must be processed:

[2016-04-25 03:48:34] -> index 0

123737 -> index 1

error -> index 2 (... and so on)

"Tough times" -> index 8

Now, the python program (I am working on), won't have any clue about the substring, i.e., "Tough times", it would simply be supplied with the number '9' (index of the word, as shown above), the program will replace whichever substring is in the 9th index with the resultant string. Similarly, if the program is supplied with the number '7', it will replace whichever substring is in the 7th index with the resultant string.

Now, I have tried using regex, sed, awk etc. but couldn't find any suitable answer. The nearest solution that I found is this regex.

But it did not meet my requirements.

Now, I have doubt whether my requirement is absurd.

Community
  • 1
  • 1
niladri chakrabarty
  • 503
  • 1
  • 7
  • 15
  • 1
    So each "substring" of your string is enclosed by round or square brackets, but may consist of multiple words? – Byte Commander May 03 '16 at 07:16
  • 3
    What are the format of the (info)? Is there a pattern we can exploit to index things up? A delimiter between every field? – Richard May 03 '16 at 07:16
  • 3
    So to clarify, by index, you don't actually mean a python string index. If so, then you're asking how we can group, split and count in the string in the way you've outlined. This is going to be a difficult thing to answer without knowing what `(info)` looks like. Your best bet is going to be to come up with a regular expression for what delimits all the separate "substrings", and use re.split. – Thtu May 03 '16 at 07:20
  • you can use **[(?<=\]|\)|\") (?=\(|\[|\")](https://regex101.com/r/yB0oX3/1)** to split but it will not care about balanced group or use simple capturing on three alternatives – rock321987 May 03 '16 at 07:34
  • @ByteCommander : No, the substrings are not enclosed by round or sqaure brackets except the datetime part [2016-04-25 03:48:34], I have used the pattern "(info)" to make people understand the structure of the log line. Anyway, editing my question for a better understanding. – niladri chakrabarty May 03 '16 at 07:40
  • @Richard : edited the question with the log line. – niladri chakrabarty May 03 '16 at 07:44
  • @niladrichakrabarty you first need to convert it to list and then you'll do the required changes and then convert it back to the string. check my answer. – Hassan Mehmood May 03 '16 at 07:44
  • @ThomasTu : Exactly, by "index" I didn't mean, any kind of python index, list or string etc. It is the requirement, for this particular problem – niladri chakrabarty May 03 '16 at 07:55

4 Answers4

5

Answer for revised question

Let's start with the string:

>>> orig = '[2016-04-25 03:48:34] 123737 error 150531221446 2000 Master dmart 843212 "Tough times"'

Next, let's divide the string into substrings:

>>> import re
>>> s = re.findall(r'(\[[^]]*\]|\w+|"[^"]*")', orig)
>>> s
['[2016-04-25 03:48:34]', '123737', 'error', '150531221446', '2000', 'Master', 'dmart', '843212', '"Tough times"']

Now, let's change the ninth substring and reassemble the string:

>>> s[8] = '"Human race"'
>>> ' '.join(s)
'[2016-04-25 03:48:34] 123737 error 150531221446 2000 Master dmart 843212 "Human race"'

More on the regex

The regular expression allows the substring to match any one of the following three patterns:

  1. \[[^]]*\]: A substring that starts with [ and ends with ] and has any character in between except for ].

  2. \w+: Any series of "word" characters.

  3. "[^"]*": A double-quoted string.

Answer for original question

This approach looks for matching delimiters in the string. The delimiters can be (a) [ and ], or (b) ( and ), or (c) " and ". The delimiters may come in any order. Once the matching delimiters are found the string is divided up into substrings which we can then change and reassemble.

To demonstrate, let's start with this string:

>>> orig = '[2016-04-25 03:48:34] (info) (info) (info) (info) (info) (info) (info) "Tough times"'

Next, let's split it up into groups with matching delimiters:

>>> import re
>>> s = re.findall(r'(\[[^]]*\]|\([^)]*\)|"[^"]*")', orig)
>>> s
['[2016-04-25 03:48:34]', '(info)', '(info)', '(info)', '(info)', '(info)', '(info)', '(info)', '"Tough times"']

Now, let's change the ninth string and reassemble:

>>> s[8]='"Human Race"'
>>> ' '.join(s)
'[2016-04-25 03:48:34] (info) (info) (info) (info) (info) (info) (info) "Human Race"'
John1024
  • 109,961
  • 14
  • 137
  • 171
0

Looks like you have a list

Just address the list by its index should do :

l=["2016-04-25 03:48:34", "info", "info", "info", "info", "info", "info", "info", "Tough times"]

l[8]
'Tough times'

List are numbered from 0, so that the first element is l[0] and the ninth element is l[8]

Louis
  • 2,854
  • 2
  • 19
  • 24
0

This is the regular expression you can use to find all substrings delimited by round brackets, square brackets, single quotes or double quotes:

(?:([\"\'])|(\()|(\[)).+?(?(1)\1|(?(2)\)|\]))

Check this regex out at regex101.com

Here's a usage example:

import re
regex = re.compile(r'(?:([\"\'])|(\()|(\[)).+?(?(1)\1|(?(2)\)|\]))')

line = '[2016-04-25 03:48:34] (info) (info) (info) (info) (info) (info) (info) "Tough times"'
index = 9  # 1-based index
replacement = '"Human race"'  # note the double quotes that will appear in the result

substrings = [match.group(0) for match in regex.finditer(s)]
item_to_replace = substrings[index-1]

result = line.replace(item_to_replace, replacement)
print(result)

Output:

[2016-04-25 03:48:34] (info) (info) (info) (info) (info) (info) (info) "Human race"
Byte Commander
  • 6,506
  • 6
  • 44
  • 71
0

you can use simple string split operation for the above :

string= '[2016-04-25 03:48:34] (info) (info) (info) (info) (info) (info) (info) "Tough times"'
a=(string.split('] ')[0]+']')
words=((string.split('] ')[1]).split(' ',7))
words.insert(0,a)

now you can access the words by their index values. for concatenating you can use :

words[8]='changed string'
new_string=' '.join(words)

and the output will be :

'[2016-04-25 03:48:34] (info) (info) (info) (info) (info) (info) (info) changed string'
Arun Sooraj
  • 737
  • 9
  • 20