Apache Arrow makes it easy to get the Parquet metadata with a lot of different languages including C, C++, Rust, Go, Java, JavaScript, etc.
Here's how to get the schema with PyArrow (the Python Apache Arrow API):
import pyarrow.parquet as pq
table = pq.read_table(path)
table.schema # pa.schema([pa.field("movie", "string", False), pa.field("release_year", "int64", True)])
See here for more details about how to read metadata information from Parquet files with PyArrow.
You can also grab the schema of a Parquet file with Spark.
val df = spark.read.parquet('some_dir/')
df.schema // returns a StructType
StructType objects look like this:
StructType(
StructField(number,IntegerType,true),
StructField(word,StringType,true)
)
From the StructType object, you can infer the column name, data type, and nullable property that's in the Parquet metadata. The Spark approach isn't as clean as the Arrow approach.