0

I have this file (MYFILE.txt) that is the output of a list of lists with dictionaries inside:

[{'entity_group': 'literal', 'score': 0.99999213, 'word': 'DNA', 'start': 0, 'end': 3}, {'entity_group': 'metaphoric', 'score': 0.9768174, 'word': 'loop', 'start': 4, 'end': 8}, {'entity_group': 'literal', 'score': 0.9039155, 'word': 'ing,', 'start': 8, 'end': 12}, {'entity_group': 'metaphoric', 'score': 0.99962616, 'word': 'in', 'start': 13, 'end': 15}, {'entity_group': 'literal', 'score': 0.9949911, 'word': 'which a protein or protein complex interacts simultaneously', 'start': 16, 'end': 75}, {'entity_group': 'metaphoric', 'score': 0.59057885, 'word': 'with', 'start': 76, 'end': 80}, {'entity_group': 'literal', 'score': 0.9983214, 'word': 'two separated sites on a DNA molecule, is a recurring theme', 'start': 81, 'end': 140}, {'entity_group': 'metaphoric', 'score': 0.9998679, 'word': 'in', 'start': 141, 'end': 143}, {'entity_group': 'literal', 'score': 0.9997542, 'word': 'transcription', 'start': 144, 'end': 157}, {'entity_group': 'metaphoric', 'score': 0.7964442, 'word': 'regula', 'start': 158, 'end': 164}, {'entity_group': 'literal', 'score': 0.99982435, 'word': 'tion [', 'start': 164, 'end': 170}]

I want to group the "literal" in order to get the text only, and leave the metaphoric as it is. I tried with this code below but it says string indices must be integers, and I also think that I could make it HTML and color it to visualize better the result, but I'm sure there's a quicker solution.

with open(r'MYFILE.txt', 'r') as res:
  texty = res.read()
  for group in texty[::-1]:
      ent = group["entity_group"]
      if ent != 'literal': 
      text2 = replace_at(ent, group['end'], group['end'], text)
print(text2)
  • 1
    What are the contents of 'MYFILE.txt'? – atinjanki Jan 23 '23 at 11:15
  • 1
    What do you expect to get in the end? – Guy Jan 23 '23 at 11:18
  • The content is what I put before, the json – Idkwhatywantmed Jan 23 '23 at 12:18
  • I expect to get sth like "hello how are you {entoty group metaphoric etc.} I'm fine", so the literal ones are text and the rest stays the same – Idkwhatywantmed Jan 23 '23 at 12:18
  • @Idkwhatywantmed there is no JSON in your q. the first snippet is a python dictionary, but JSON standards require double quotes https://stackoverflow.com/questions/36038454/parsing-string-as-json-with-single-quotes ...also, `.read()` (generally) returns a string and if it *was* json, you'd need [`json.loads`](https://www.geeksforgeeks.org/python-difference-between-json-load-and-json-loads/) to parse it into a python object – Driftr95 Jan 24 '23 at 05:56

1 Answers1

0

but it says string indices must be integers

texty is a string [not a list of dictionaries as you seem to be expecting] because that's what .read() returns (when you open with mode='r'); so when you iterate through it with for group in texty..., each group is a single-character string [and not a dictionary], and that's why the error is being raised [I assume] at ent = group["entity_group"].


Reluctant suggestion: Try adding a line with exec(f'texty = {texty.strip()}') before for group....

But this is NOT a good way to save data. Please look into json and pickle. [I prefer json as it's not specific to python.]



As for

I also think that I could make it HTML and color it to visualize better the result, but I'm sure there's a quicker solution

I'm afraid there isn't enough code nor context to fully understand what you mean here. Including the definition of replace_at as well as a sample set of text and [desired] text2 values might help.

Driftr95
  • 4,572
  • 2
  • 9
  • 21