Pyspark Runtime Error Dictionary Changed size during iteration

Question

I am trying to map a cef file to a data frame and ultimately to an output file. I'm getting RuntimeError: dictionary changed size during iteration.

I've tried these solutions: 1, 2, 3, 4, etc. I'm not even sure where this dictionary is (in the lambda?) that the error is referring to. I do not believe this is a duplicate of the others as I am not explicitly using a dictionary anywhere in the code, so calling .keys() or .items() is not an option

I created a simple text file with the cef access and security events example:

I then ran the code below:

import pyspark
from pyspark.sql import SparkSession
from pyspark.streaming import StreamingContext
sc = SparkContext('local[2]','NetworkLog')
spark = SparkSession(sc)
target_data = sc.textFile('log.txt')

import re
def parse(str_input):
    ...
    return values

parsed = target_data.map(lambda line:parse(line))
df = parsed.map(lambda x: (x['rt'],x['dst'],x['dhost'],x['act'],x['suser'],x['requestClientApplication'],x['threat name'],x['DeviceSeverity'],x['riskscore'])).toDF(['source_time','ip','host_name','act','suser','requestClientApplication','threatname','DeviceSeverity','riskscore'])

*parser found here

This may be a separate question, but sometimes the code breaks when values in parsed are missing/null/0.0.0, so I'd also need a way to write null or 0.0.0 to the dataframe.

@user6910411 given the specificity of this problem, do you know how I might apply any of those other answers? — Mariah Akinbi, Sep 06 '18 at 10:26
It is not a Spark problem. The `parse` method you use is just invalid. If you take a look at the [source](https://github.com/DavidJBianco/pycef/blob/5c72a6696e15a092577ab171ef9d30e5ab3dd166/pycef.py#L58-L69) you'll see it modifies the dictionary while looping over it. Hence the exception. So if you want to fix it, you'll have to rewrite whole thing. I you'll encounter problems with that I would recommend asking more targeted question - remove all the Spark distractions (tags, Spark code), tagging question with [tag:python] and [tag:dict] and posting a [mcve]. — zero323, Sep 06 '18 at 10:31

Pyspark Runtime Error Dictionary Changed size during iteration

0 Answers0