0

Is there a simpler or more idiomatic way to write this?

def jsonToCsvPart1(fields):
    csvFormat = ''
    fields = json.loads(fields)
    for key in fields.keys() :
        tf = str(fields[key])
        mk = ''
        for idx in tf :
            if idx == '{' or idx == '}' or idx == '[' or idx == ']' :
                continue
            else :
               mk += idx
        csvFormat += (mk + ",")
    yield csvFormat
tdelaney
  • 73,364
  • 6
  • 83
  • 116
Quack
  • 680
  • 1
  • 8
  • 22
  • 2
    what type are `mk`, `idx` and `tf`? – Pierre D Jan 05 '21 at 05:27
  • Please make this a full working program... really that's just setting an initial value for `mk`. – tdelaney Jan 05 '21 at 05:35
  • @PierreD All three are strings.@tdelaney yes. – Quack Jan 05 '21 at 05:44
  • yikes -- that's a whole different question, now with the context! I would not do it like that at all then. How deeply nested is your `fields` dict? – Pierre D Jan 05 '21 at 05:59
  • you should perhaps look at `pandas`. I just answered a question on how to put a deeply nested `dict` from a JSON string into a `DataFrame` [here](https://stackoverflow.com/questions/65573305/faster-way-to-make-pandas-multiindex-dataframe-than-append/65573463#65573463). And `pandas` excels at many things, including writing CSV files (fast). – Pierre D Jan 05 '21 at 06:05
  • Looks like you're trying to manually write a json parser/ converter to csv. You're aware that Python already has tons of libraries for this, to fit almost every use-case? Unless this is a homework assignment, I wouldn't do it. See e.g. [How to parse data in JSON format?](https://stackoverflow.com/questions/7771011/how-to-parse-data-in-json-format) – smci Jan 05 '21 at 06:22
  • Does this answer your question? [How to parse data in JSON format?](https://stackoverflow.com/questions/7771011/how-to-parse-data-in-json-format) – smci Jan 05 '21 at 06:23
  • @smci It's not homework, it's data flow from GCP. It's data processing. So I used Python, and I wrote the code about Python, and it was made too long, so I posted a question to increase readability. ): – Quack Jan 05 '21 at 06:31
  • Then why don't you use any of the many json libraries? – smci Jan 05 '21 at 06:33
  • @smci I have already used json loads. Because I wanted to change the json format to csv, I changed the value of json to csv (except header) string format. So, I didn't really feel the need except for json loads. also I have to put it in csv format (type : string) to load it on the table of GCP. – Quack Jan 05 '21 at 06:39
  • Then this seems to be the duplicate [How can I convert JSON to CSV?](https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv) – smci Jan 05 '21 at 06:41
  • @smci Thank you. :) – Quack Jan 05 '21 at 06:53
  • It seems from `csvFormat += (mk + ",")` that this code is trying to make a csv cell. If any of the interior data in `mk` contains a comma or a quote, they will not be escaped properly. It may be better to yield a list and have the `csv` module write it as a CSV row. – tdelaney Jan 05 '21 at 16:30

3 Answers3

3

I'm not sure what the grand scheme is, but you can write it in a way that will likely be faster:

Here, assuming you are building a string:

exclude = set(list('{}[]'))  # note: set is faster than list for membership test
mk = ''.join([idx for idx in tf if idx not in exclude])
# 61 ms on a 1M-char random string
exclude = '{}[]'  # ...but string is even faster, when so short
mk = ''.join([idx for idx in tf if idx not in exclude])
# 38 ms on a 1M-char random string

By the way, it will be considerably faster to achieve the same by letting the large loop (on all chars of tf) be done by builtin functions, and just iterate on the chars to exclude:

mk = tf.replace('{', '').replace('}', '').replace('[', '').replace(']', '')
# 11.8ms on 1M chars

And yet faster:

mk = tf.translate({ord(c): None for c in '{}[]'})
# 4.5 ms on 1M chars

Setup (if anyone is interested to look for yet a faster way):

tf = ''.join(np.random.choice(list('abcdefghijk{}[]'), size=1_000_000))
Pierre D
  • 24,012
  • 7
  • 60
  • 96
1

For your specific purpose (and learning), checking specific chars in a string will work. python set are faster for checking membership. You can refer to others's answers on how to do that. eg.

idxs = '{}[]'
for idx in tf:
    if idx in idxs:
        continue
    else:
        mk += idx
Rahul
  • 10,830
  • 4
  • 53
  • 88
0

I cannot find more readable names as there is no sample data, so I keep them untouched.

You could probably nest these two for loops together, to get even less lines, but poor readability IMHO.

def jsonToCsvPart1(fields):
    csvFormats = []
    for tf in json.loads(fields):
        mk = ''.join(str(idx) for idx in tf if str(idx) not in '{}[]')
        csvFormats.append(mk)
    yield ','.join(csvFormats)
Yang Liu
  • 346
  • 1
  • 6