-1

Text file has over 50K lines with this format

M:org.apache.mahout.common.RandomUtilsTest:testHashDouble():['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloat():['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction():['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction2():['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']

How do I read and format this data into a dictionary so that all of the methods in the [] are individual values and the string before the [ (test method) is the key? And how would I remove the '' before storing them as values in the dictionary? #python

Here is the code used to populate the text file. Now I am trying to take that txt file data and read/parse it back into another dictionary.

    d = {}
    with open("filtered.txt") as input:
        for line in input:
            (key, val) = line.strip().split(" ")
            if str(key) in d:
                d[str(key)].append(val)
            else:
                d[str(key)] = [val]

    keys = []
    for key in d:
        keys.append(key)

    keys.sort()

    input.close()

    with open('mahout-coverage.txt', 'w') as outfile:
        for key in keys:
            outfile.writelines('{}:{}'.format(key, d[key]) + "\n")

AMC
  • 2,642
  • 7
  • 13
  • 35
kit
  • 1
  • 2
  • could you provide a sample of your expected dict output format using the input you just described? just to avoid confusion. – kareem_emad May 01 '20 at 06:25
  • the dictionary would output the same thing as the above, the sample lines listed above are from a text file where the data came from a dictionary created by the additional code ive added above – kit May 01 '20 at 06:39
  • Now I am trying to reverse parse that data from the text file into a dictionary for more use – kit May 01 '20 at 06:39
  • The best solution here would be to improve the format of the output! Just use an existing format like JSON. – AMC May 01 '20 at 09:40
  • Does this answer your question? [Convert a String representation of a Dictionary to a dictionary?](https://stackoverflow.com/questions/988228/convert-a-string-representation-of-a-dictionary-to-a-dictionary) – AMC May 01 '20 at 09:40

2 Answers2

0

json module can be used to store python dictionary into a file and later load the file and still parse it to the same data type before it was written to the file.

d = {}
with open('filtered.txt') as input:
    for line in input:
        key, value = line.strip().split("():")
        key = "{}()".format(key)
        d[key] = value

print(d)

# It would be better and easy if you write the data to the file using json module
import json

with open('data.txt', 'w') as json_file:
  json.dump(d, json_file)

# Later you can read the file using the json module itself
with open('data.txt') as f:
  # this data would be a dicitonay which can be easily managed.
  data = json.load(f)

Refer: json.dump() and json.load()

Achuth Varghese
  • 2,356
  • 1
  • 4
  • 18
-1

Using ast.literal_eval you can convert string list to list

from collections import defaultdict
import ast
with open('tst.txt') as fp:
    d = defaultdict(list)
    for line in fp:
        k, v = line[: line.index('):') + 1], ast.literal_eval(line[line.index(':[') + 1:])
        d[k] += v
print(dict(d))

Output:

{
M:org.apache.mahout.common.RandomUtilsTest:testHashDoubl :  ['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloa :  ['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunctio :  ['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction :  ['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
}
deadshot
  • 8,881
  • 4
  • 20
  • 39
  • k, v = line[: line.index('():') - 1], ast.literal_eval(line[line.index(':[') + 1:]) ValueError: substring not found – kit May 01 '20 at 07:22
  • can I contact you directly? I sent you a message on twitter – kit May 01 '20 at 07:23
  • @kit some of the values in your file in wrong format and it doesn't contain `[` symbol – deadshot May 01 '20 at 07:34
  • https://stackoverflow.com/questions/988228/convert-a-string-representation-of-a-dictionary-to-a-dictionary – AMC May 01 '20 at 09:40