The file is 20 GB, the line ending char is ␀. Below are PySpark code:
text_file = sc.textFile(file_name)
counts = text_file.flatMap(lambda line: line.split("␀"))
counts.count()
The error as below: Too many bytes before newline: 2147483648
Question: How to use PySpark read in a big customized line ending file?