Highest Voted 'pyspark-schema' Questions

3

votes

2 answers

Update a specific value when 2 other values matches from 2 different tables in PySpark

Any idea how to write this in PySpark? I have two PySpark DataFrames that i'm trying to union. However, there is 1 value that I want to update based on 2 duplicate column values. PyDf1: +-----------+-----------+-----------+------------+ |test_date …

asked Sep 26 '22 at 16:52

Mick

265
2
10

3

votes

1 answer

How to create dataframe with struct column in PySpark without specifying a schema?

I am learning PySpark and it is convenient to be able to quickly create example dataframes to try the functionality of the PySpark API. The following code (where spark is a spark session): import pyspark.sql.types as T df = [{'id': 1, 'data': {'x':…

apache-spark pyspark struct apache-spark-sql pyspark-schema

asked May 01 '22 at 16:16

karpan

421
1
5
13

3

votes

1 answer

how to change a column type in array struct by pyspark

how to change a column type in array struct by pyspark, for example, I would like to change userid from int to long root |-- id: string (nullable = true) |-- numbers: array (nullable = true) | |-- element: struct (containsNull = true) …

pyspark apache-spark-sql pyspark-schema

asked Mar 26 '22 at 02:50

Frank

977
3
14
35

2

votes

1 answer

Is there any way to convert flatten Dataframe to nested Dataframe using Pyspark?

apache-spark pyspark pyspark-schema

asked Mar 22 '23 at 18:08

D Das

31
1

2

votes

2 answers

PySpark read JSON with custom nested schema doesn't apply

I have this simple JSON file: {"adas":{"parkAssist":{"rear":{"alarm":false,"muted":false},"front":{"alarm":false,"muted":false}},"lane":{"keepAssist":{"right":false,"left":false}}}} But when I'm trying to read it like…

json apache-spark pyspark nested pyspark-schema

asked Jun 08 '22 at 10:55

Valéry

31
5

2

votes

0 answers

PySpark Lag function based on condition

I am new to PySpark and have been trying a few stuff. I have a data frame as follows +----------+-----------+ | Column1| Column2| +----------+-----------+ | VALUE1| 30000| | VALUE2| 25000| | VALUE3| 20000| | VALUE4| …

pyspark apache-spark-sql pyspark-schema

asked May 19 '22 at 11:24

SamaAdi

41
1
6

2

votes

2 answers

Update a highly nested column from string to struct

scala apache-spark pyspark apache-spark-sql pyspark-schema

asked Apr 28 '22 at 01:35

Chirag Sejpal

877
2
9
17

2

votes

2 answers

Specifying column with multiple datatypes in Spark Schema

I am trying to create schema to parse json into spark dataframe I have column value in json which could be either struct or string "value": { "entity-type": "item", "id": "someid", "numeric-id": 30 } "value": "SomePicture.jpg", How…

apache-spark jsonschema pyspark-schema

asked Apr 19 '22 at 10:04

Neha Zaveri

21
5

1

vote

2 answers

Selecting a column with backtick in its name - AnalysisException: cannot resolve Column

I have a data frame which has the below column: Last Login- Date & Time(Incl. Time Zone) When I read the data and print the schema, the column gets printed df.printSchema() But when I try selecting the column from the data frame it…

dataframe apache-spark pyspark select pyspark-schema

asked Aug 04 '23 at 12:53

Jim Macaulay

4,709
4
28
53

1

vote

1 answer

How to replace null value with some value using coalesce in pyspark

I have two files :- orders_renamed.csv , customers.csv I am joining them with full outer join and then dropping same column (customer_id). I want to replace null vaue to "-1" in "order_id" column. I have tried this: from pyspark.sql.functions import…

python pyspark apache-spark-sql bigdata pyspark-schema

asked Jul 31 '23 at 06:07

Vivek Mishra

23
3

1

vote

1 answer

how to define Schema for semi - structured text file in pysparK

1 2013-07-25 11599,CLOSED 2 2013-07-25 256,PENDING_PAYMENT 3 2013-07-25 12111,COMPLETE 4 2013-07-25 8827,CLOSED 5 2013-07-25 11318,COMPLETE 6 2013-07-25 7130,COMPLETE 7 2013-07-25 4530,COMPLETE 8 2013-07-25 2911,PROCESSING 9…

python pyspark apache-spark-sql bigdata pyspark-schema

asked Jul 29 '23 at 19:36

Vivek Mishra

23
3

1

vote

1 answer

Pyspark nested json with dynamical column names into one column

Could you help me? I need from this JSONL data: {"id": 1, "data": {"key:1": {"string_value": "value_1"}, "key:2": {"string_value": "value_2"}, "user_id": {"string_value": "value_4"}}} {"id": 2, "data": {"key:3": {"string_value": "value_3"},…

python pyspark apache-spark-sql pyspark-schema

asked Jun 07 '23 at 07:36

zigi

21
2

1

vote

1 answer

Getting nulls while selecting a dataframe from a JSON file in PySpark

I am using spark 3.1 and trying to read a JSON file I have defined the schema for below file as: StructType([ StructField('search_metadata', MapType(StringType(),StringType())), StructField('search_parameters',…

json apache-spark pyspark apache-spark-sql pyspark-schema

asked Nov 03 '22 at 07:15

Xi12

939
2
14
27

1

vote

1 answer

Data Frames being read in with varying number of columns, how do I dynamically change data types of only columns that are Boolean to String data type?

In my notebook, I have Data Frames being read in that will have a variable number of columns every time the notebook is ran. How do I dynamically change the data types of only the columns that are Boolean data types to String data type? This is a…

python pyspark pyspark-schema

asked Sep 09 '22 at 23:13

JTD2021

127
2
12

1

vote

0 answers

A schema mismatch detected when writing to the Delta table Data stream write

I am having .option("mergeSchema", "true") in my code still I am getting schema mismatch error. I am reading schema for parquet my timestamp was in bigint format so I converted to timestamp format and then created new column date which I want to…

apache-spark pyspark apache-spark-sql spark-streaming pyspark-schema

asked Jul 24 '22 at 02:59

Manav Jain

21
2

Questions tagged [pyspark-schema]