-1

I want to assign values to the dataframe column from a list on a condition, but my code only works on hard-coded replacements and not a dynamic version like lists.

And I can't convert the list directly to dataframe column bcuz its length is way shorter than the column's length

no_connections = network_data.map(lambda row: (row[1], 1)).reduceByKey(lambda a,b: a+b).collect()

network_data1 = network_data1\
                .withColumn("NoUserConnections", when(network_data1.NoUserConnections == 0, no_connections[0])
                .otherwise(network_data1.NoUserConnections))

I can also get the values of no_connections from a dataframe like so

network_data1.groupby('User').count().show()

My Dataframe looks like this:

+---+----+-----------+-----------------+
|_c0|User|Connections|NoUserConnections|
+---+----+-----------+-----------------+
|  0|   0|          1|                0|                       
|  1|   0|          2|                0|                       
|  2|   0|          3|                0|                       
|  3|   0|          4|                0|                       
|  4|   0|          5|                0|                       
|  5|   0|          6|                0|                       
|  6|   1|          7|                1|                       
|  7|   1|          8|                1|                       
|  8|   1|          9|                1|                       
|  9|   1|         10|                1|                      
+---+----+-----------+-----------------+

and I want to put the number of instances of each User value to their corresponding User like this

+---+----+-----------+-----------------+
|_c0|User|Connections|NoUserConnections|
+---+----+-----------+-----------------+
|  0|   0|          1|                6|                       
|  1|   0|          2|                6|                       
|  2|   0|          3|                6|                       
|  3|   0|          4|                6|                       
|  4|   0|          5|                6|                       
|  5|   0|          6|                6|                       
|  6|   1|          7|                4|                       
|  7|   1|          8|                4|                       
|  8|   1|          9|                4|                       
|  9|   1|         10|                4|                      
+---+----+-----------+-----------------+
Bilal
  • 13
  • 3
  • Include an example to help us understand what you are trying to do. Follow [Minimally Reproducible Example](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples/48427186#48427186) to better structure your question. – Nithish Dec 14 '21 at 19:30
  • Are you trying to find for each user the number of times they appear in the dataframe? – Nithish Dec 14 '21 at 20:54

1 Answers1

0

Assuming you are trying to compute the number of occurences of each user in the dataframe and assign it to the user. You can use window functions in PySpark and apply a count aggregate.

from pyspark.sql import functions as F
from pyspark.sql import Window as W

data = [(0, 0, 1, 0, ),
        (1, 0, 2, 0, ),
        (2, 0, 3, 0, ),
        (3, 0, 4, 0, ),
        (4, 0, 5, 0, ),
        (5, 0, 6, 0, ),
        (6, 1, 7, 1, ),
        (7, 1, 8, 1, ),
        (8, 1, 9, 1, ),
        (9, 1, 10, 1, ),]

df = spark.createDataFrame(data, ("Id", "User", "Connections", "NoUserConnections", ))

window_spec = W.partitionBy("User").rowsBetween(W.unboundedPreceding, W.unboundedFollowing)

df.withColumn("NoUserConnections", F.count("Connections").over(window_spec)).show()

Output

+---+----+-----------+-----------------+
| Id|User|Connections|NoUserConnections|
+---+----+-----------+-----------------+
|  0|   0|          1|                6|
|  1|   0|          2|                6|
|  2|   0|          3|                6|
|  3|   0|          4|                6|
|  4|   0|          5|                6|
|  5|   0|          6|                6|
|  6|   1|          7|                4|
|  7|   1|          8|                4|
|  8|   1|          9|                4|
|  9|   1|         10|                4|
+---+----+-----------+-----------------+
Nithish
  • 3,062
  • 2
  • 8
  • 16