Highest Voted 'parquet-mr' Questions

13

votes

5 answers

Installing parquet-tools

I am trying to install parquet tools on a FreeBSD machine. I cloned this repo: git clone https://github.com/apache/parquet-mr Then I did cd parquet-mr/parquet-tools Then I did `mvn clean package -Plocal As specified here:…

asked Nov 14 '18 at 18:05

user3685285

6,066
13
54
95

6

votes

0 answers

Sorted parquet files for query optimization

Question Purpose Sorting a parquet files provides a number of benefits: more efficient filtering using file metadata more efficient compression rate There may be other benefits for this. There is a lot of discussion about this on the Internet. For…

apache-spark sorting parquet parquet-mr

asked Oct 27 '21 at 13:03

Amin

1,643
16
25

5

votes

1 answer

Converting Arrow to Parquet and vice versa in java

I have been looking at ways to convert arrow to parquet and vice versa in Java. Even though the Python library for arrow has full support for the mentioned conversion, I can hardly find any documentation for the same in Java. Has anyone come across…

java parquet apache-arrow parquet-mr

asked Sep 17 '19 at 12:43

Optimus

697
2
8
22

4

votes

0 answers

Parquet storage size higher for duplicate data

I have a dataset which has close to 2 billion rows in parquet format which spans in 200 files. It occupies 17.4GB on S3. This dataset has close to 45% of duplicate rows. I deduplicated the dataset using 'distinct' function in Spark, and wrote it to…

apache-spark pyspark apache-spark-sql parquet parquet-mr

asked May 04 '20 at 05:44

Phanindra Kothoori

61
6

3

votes

2 answers

read a parquet file using Java, but it works in local machine, and doesn't work in docker container

I have a requirement to read parquet files and publish to Kafka in a Java standalone application. I have the below code to read the parquet file which is generated by spark scala application. public void readTest(Path path) { try { …

java spring-boot apache-spark parquet parquet-mr

asked Aug 28 '21 at 08:03

Sugyan sahu

129
1
8

3

votes

1 answer

INT32 type error when scanning parquet federated table. Bug or Expected behavior?

I am using BigQuery to query an external data source (also known as a federated table), where the source data is a hive-partitioned parquet table stored in google cloud storage. I used this guide to define the table. My first query to test this…

google-bigquery parquet parquet-mr

asked Apr 09 '20 at 11:17

conradlee

12,985
17
57
93

3

votes

0 answers

Is it possible to write multiple oracle database tables into one parquet file?

I have a requirement where I want to convert my oracle DB data to parquet. So in my database I have multiple tables for example Employee, Department. So is it possible to insert the data of both the tables in single parquet file? Or do i need to…

parquet parquet-mr

asked Dec 13 '19 at 17:10

Ankur Gupta

31
1

3

votes

1 answer

Why is dictionary page offset 0 for `plain_dictionary` encoding?

The parquet was generated by Spark v2.4 Parquet-mr v1.10 n = 10000 x = [1.0, 2.0, 3.0, 4.0, 5.0, 5.0, None] * n y = [u'é', u'é', u'é', u'é', u'a', None, u'a'] * n z = np.random.rand(len(x)).tolist() dfs = spark.createDataFrame(zip(x, y, z),…

parquet arrows pyarrow parquet-mr

asked Mar 18 '19 at 15:45

colinfang

20,909
19
90
173

2

votes

0 answers

Does Apache Parquet support Custom Filter Predicate on Repeated values?

Does Apache Parquet support Custom Filter Predicate on Repeated values? By applying a filter on a repeated value, I get: FilterPredicates do not currently support repeated columns. Column part.x is repeated The filter I set on the x double…

parquet parquet-mr

asked Feb 20 '23 at 21:35

Nicholas Kou

173
2
13

2

votes

0 answers

parquet-tools cannot read zstd files but can read gzip?

I installed the latest version of parquet-tools from apache-mr with version parquet-tools-1.8.2.jar. Here is a reproducible example: >>> import boto3 >>> client = GET_CLIENT() # redacted >>> import pandas as pd >>> df = pd.DataFrame([[1,2,3]],…

python pandas amazon-s3 parquet parquet-mr

asked Nov 22 '20 at 19:42

OneRaynyDay

3,658
2
23
56

2

votes

0 answers

Add parquet-tools to path (Visual Studio Code)

I am trying to use this parquet-viewer so I can easily view parquet files in Visual Studio Code. It requires that parquet-tools are available in the path. I did brew install parquet-tools and when I try to open my .parquet file with Visual Studio…

visual-studio parquet parquet-mr

asked Aug 26 '19 at 17:09

Mike

444
1
8
19

2

votes

0 answers

Read a fastparquet file using Akka parquet

I have one of our Python systems generating Parquet files using Pandas and fastparquet. These are to be read by a Scala system that runs atop Akka streams. Akka does provide a source for reading Avro Parquet files. However, when I try to read the…

scala akka parquet akka-stream parquet-mr

asked Jun 05 '19 at 15:10

An SO User

24,612
35
133
221

2

votes

1 answer

PySpark Write Parquet Binary Column with Stats (signed-min-max.enabled)

I found this apache-parquet ticket https://issues.apache.org/jira/browse/PARQUET-686 which is marked as resolved for parquet-mr 1.8.2. The feature I want is the calculated min/max in the parquet metadata for a (string or BINARY) column. And…

python-2.7 apache-spark pyspark parquet parquet-mr

asked Nov 05 '18 at 16:12

Nevermore

7,141
5
42
64

1

vote

0 answers

AvroParquetWriter - addLogicalTypeConversion not working as expected (using version parquet-avro 1.12.3) - causing ClassCastException

I am writing ResultSet to parquet file using AvroParquetWriter. One column in the ResultSet is java.sql.Timestamp. When writing, I get the exception : java.sql.Timestamp cannot be cast to java.lang.Number Adding addLogicalTypeConversion does not…

java parquet resultset classcastexception parquet-mr

asked Nov 04 '22 at 08:13

javaseeker

73
1
9

1

vote

0 answers

How should protobuf message with repeated fields be converted to parquet to be queried by Athena?

We write parquet files to S3 and then use Athena to query from that data. We use "parquet-protobuf" library to convert proto message into parquet record. We recently added a repeated field into our proto message definition and we were expecting to…

parquet amazon-athena parquet-mr

asked Jul 04 '22 at 05:44

user2903819

180
2
12

Questions tagged [parquet-mr]