Questions tagged [fastavro]

fastavro is python Avro implementations for data serialization and deserialization.

fastavro is python Avro implementations for data serialization and deserialization. More features can be found here.

38 questions
6
votes
1 answer

Avro deserialization from Kafka using fastavro

I am building an application which receives data from Kafka. When using standard avro library provided by Apache ( https://pypi.org/project/avro-python3/ ) the results are correct, however, the deserialization process is terribly slow. class…
Michał
  • 616
  • 1
  • 7
  • 22
5
votes
0 answers

Fastavro fatal error: 'Python.h' file not found for Mac M1

I am trying to install fastavro on my Mac M1 laptop, but got the error: Headers -c fastavro/_read.c -o build/temp.macosx-10.9-universal2-cpython-39/fastavro/_read.o fastavro/_read.c:6:10: fatal error: 'Python.h' file not found #include…
Zichen Ma
  • 907
  • 3
  • 14
  • 30
3
votes
0 answers

Confluent Kafka python schema parser causes conflict with fastavro

I am running Python 3.9 with Confluent Kafka 1.7.0, avro-python3 1.10.0 and fastavro 1.4.1. The following code uses Avro schema encoder in order to encode a message, which succeeds only if we transform the resulting schema encoding by getting rid of…
gt6989b
  • 4,125
  • 8
  • 46
  • 64
3
votes
1 answer

Parsing Multiple AVRO (avsc files) which refer each other using python (fastavro)

I have a AVRO schema which is currently in single avsc file like below. Now I want to move address record to a different common avsc file which should be referenced from many other avsc file. So Customer and address will be separate avsc files. How…
Codegator
  • 459
  • 7
  • 28
2
votes
1 answer

Fastavro fails to parse Avro schema with enum

I have the following code block: import fastavro schema = { "name": "event", "type": "record", "fields": [{"name": "event_type", "type": "enum", "symbols": ["START", "STOP"]}], } checker = fastavro.parse_schema(schema=schema) Upon…
andand
  • 17,134
  • 11
  • 53
  • 79
2
votes
0 answers

How to normalize decimal values while iterating over dataframe rows using toLocalIterator

I have a pyspark dataframe which contains a decimal column and the schema for that particular decimal column is Decimal(20,8). When I do a df.show() it shows 3.1E-7 as value for the decimal column for a particular row. Now I am trying to write this…
newbie
  • 1,282
  • 3
  • 20
  • 43
2
votes
1 answer

How do I decode an Avro message in Python?

I am having trouble decoding an Avro message in Python (3.6.11). I have tried both the avro and fastavro packages. So I think that the problem may be that I'm providing the bytes incorrectly. Using avro: from avro.io import DatumReader,…
glevine
  • 697
  • 1
  • 7
  • 19
1
vote
1 answer

Error installing fastavro==1.7.3 on MacOS, Python 3.10

I have a weird error when trying to install fastavro==1.7.3 (as part of poetry install) on a pyenv-managed Python 3.10.12. However, it installs fine on the same machine using Python 3.11.4. Any idea what is happening in 3.10? $ pip install…
planetp
  • 14,248
  • 20
  • 86
  • 160
1
vote
0 answers

What is the best way to upgrade avro files (stored on GCS) having older schemas (containing "default":"null") to newer formats (with "default":null)

We have quite a few avro files on GCP (total storage size in PBs) which have older schemas (containing "default":"null" on the header schema section for a few 'record' type columns). Now when we are trying to load those to BQ, BigQuery is not able…
1
vote
1 answer

Fastavro Schemaless Reader

My Follow-up question of this -> [Avro deserialization from Kafka using fastavro] Is there any way to read all records from avro file(Without Header) using fastavro schemaless_reader() ?
Toulik
  • 21
  • 3
1
vote
0 answers

timestamps microsecond precision is reset to 000 when kafka deserializes data to create sql insert (?)

Using avro schema for data; have a field for timestamp named 'time' and its like this: {"name": "time", "type": {"type": "long", "logicalType": "timestamp-micros"}}, The timestamp-micros could alternatively be timestamp-millis but I want…
1
vote
1 answer

avro schema timestamp format

I am looking to get the timestamp in this format: MMDDYYYYHHMMSS For avro schema format, can I use: { "logicalType": "timestamp-millis" "type": "long", "date-format": "MMDDYYYYHHMMSS" } Or is there a better way to do this?
dataviews
  • 2,466
  • 7
  • 31
  • 64
1
vote
1 answer

Deserialization, fixed data type in Avro

I am new in avro and I have a avro file to deserialize. Some schemas use fixed type of data to store MAC addresses. Below schema is one of those schemas and used in different schemas as a type. The schema for MAC addresses like below: { "type":…
bhdrozgn
  • 167
  • 10
1
vote
1 answer

confluent_kafka.error.ValueSerializationError: KafkaError{code=_VALUE_SERIALIZATION,val=-161 : ValueError

I am new bee to python and trying to use 'confluent_kafka' for avro message produce. Using 'confluent_kafka.schema_registry.avro.AvroSerializer' for the same (referred :…
1
vote
1 answer

AVRO schema for JSON

I have a JSON which gets generated like this. I wanted to know what would the avro schema for this would be. The number of keys values in array list is not fixed. There are related posts but they have the keys referenced and do not change. In my…
1
2 3