1

I tried to use the following python to parse sample file(sample.txt). But the result is unexpected.

sample:

# Summary Report #######################

System time | 2020-02-27 15:35:32 UTC (local TZ: UTC +0000)
# Instances ##################################################
  Port  Data Directory             Nice OOM Socket
  ===== ========================== ==== === ======
                                   0    0
# Configuration File #########################################
              Config File | /etc/srv.cnf
[mysqld]
server_id            = 1
port                                = 3016
tmpdir                              = /tmp
performance_schema_instrument       = '%=on'
innodb_monitor_enable               = 'module_adaptive_hash'
innodb_monitor_enable               = 'module_buffer'

[client]
port                                = 3016

# management library ##################################
jemalloc is not enabled in mysql config for process with id 2425
# The End ####################################################

code.py

import json
import re

all_lines = open('sample.txt', 'r').readlines()

final_dict = {}
regex = r"^([a-zA-Z]+)(.)+="

config = 0 # not yet found config
for line in all_lines:
    if '[mysqld]' in line:
        final_dict['mysqld'] = {}
        config = 1
        continue
    if '[client]' in line:
        final_dict['client'] = {}
        config = 2
        continue

    if config == 1 and re.search(regex, line):
        try:
            clean_line = line.strip() # get rid of empty space
            k = clean_line.split('=')[0].rstrip() # get the key
            v = clean_line.split('=')[1].lstrip()
            final_dict['mysqld'][k] = v
        except Exception as e:
            print(clean_line, e)

    if config == 2 and re.search(regex, line):
        try:
            clean_line = line.strip() # get rid of empty space
            k = clean_line.split('=')[0].rstrip() # get the key
            v = clean_line.split('=')[1].lstrip()
            final_dict['client'][k] = v
        except Exception as e:
            print(clean_line, e)

print(final_dict)
print(json.dumps(final_dict, indent=4))

with open('my.json', 'w') as f:
    json.dump(final_dict, f, sort_keys=True)

The unexpected result:

{ "client": { "port": "3016" }, "mysqld": { "performance_schema_instrument": "'%", "server_id": "1", "innodb_monitor_enable": "'module_buffer'", "port": "3016", "tmpdir": "/tmp" } }

The expected result:

{
    "client": {
        "port": "3016"
    }, 
    "mysqld": {
        "performance_schema_instrument": "'%=on'", 
        "server_id": "1", 
        "innodb_monitor_enable": "'module_buffer','module_adaptive_hash'", 
        "port": "3016", 
        "tmpdir": "/tmp"
    }
}

Is is possible to achieve the above result?

kanpai
  • 25
  • 3
  • What did you miss? The only difference I'm spotting is in `performance_schema_instrument` . Is that the problem? – Juan C May 11 '20 at 16:12
  • It looks like you're just looking to indent the json file in a more human-readable format. You almost have it - you include the `indent=4` in your `json.dumps` command to show yourself, just also include it in the `json.dump` command to write out to the file ([link](https://stackoverflow.com/a/40242210/12568761)) – tiberius May 11 '20 at 16:16
  • That looks like a configuration file. Have you looked at the configparser library? This libaray parsers configuration files for you. – Bobby Ocean May 11 '20 at 16:17
  • After parsing, the value of performance_schema_instrument should be "'%=on'", not "'%". Thanks. – kanpai May 12 '20 at 02:30

1 Answers1

2

The configparser is used to handle configuration file settings in python.

import configparser, re, json

regex_string         = '# Configuration File #.*?\n(\[.*?)# management library #'
configuration_string = re.findall(regex_string,open('temp').read(),re.DOTALL)[0]

c = configparser.RawConfigParser(strict=False)
c.read_string(configuration_string)

settings = {k:dict(v) for k,v in c.items() if k!='DEFAULT'}
json.dump(settings,open('temp.json','w'),sort_keys=True,indent=4)
Bobby Ocean
  • 3,120
  • 1
  • 8
  • 15
  • Is configparser module supported by python 3, not python 2.7.5? I would like to parse multiple values into the key innodb_monitor_enable. Is it possible? – kanpai May 12 '20 at 03:40
  • I tested this on 3.7. – Bobby Ocean May 12 '20 at 05:43
  • is it possible to export the following output? "innodb_monitor_enable": ['module_adaptive_hash', 'module_buffer'] – kanpai May 12 '20 at 06:52
  • That really isn't the intention of configparser. Normally it is frowned upon to have duplicate options, logically it makes no sense. The proper way to overwrite or have access to configurations is to have multiple config files. Like a user config file, global config file, system config file. Configparser will read all of these together following rules you set. Hence, you can overwrite what your user's set, or have your user's overwrite what you set, etc. – Bobby Ocean May 12 '20 at 16:30
  • Obviously, you can put multiple values in your one variable, inside your configuration file, like innodb_monitor_enable = 'module_adaptive_hash','module_buffer', that is, if you are allowing the configuration file to be edited. – Bobby Ocean May 12 '20 at 16:36
  • You can even place values on separate lines for a single variable. That is, simply delete the second "innodb_monitor_enable = " and both values will be placed in the single variable. – Bobby Ocean May 12 '20 at 16:39