selecting multiple rows with a list of id pyspark

Asked Mar 06 '19 at 12:45

Active Mar 06 '19 at 14:43

Viewed 213 times

I have a table in spark, which has ID and numOfReq attributes. in ID, it is between 1 to 100 and it's not in order, and each ID can be repeated many times in the table. I want to extract rows with 1, 47, 54 and 89 IDs. I can do it with a for loop like this pseudo code:

temp = [None , None, None, None]
i = 0
for id in idList:
    temp[i] = table.filter(table['ID'] == id)
    i += 1

but it took a long time to do so. is there any filter or library which do this fast? what should I insert in my code? I need something in pyspark

edited Mar 06 '19 at 14:43

asked Mar 06 '19 at 12:45

MHB

Do you want 4 different tables for 1, 47, 54 and 89 respectively? Secondly, you use `id` in the `for` loop and then use `temp[i]`? `i` is undefined. You mention that it took you a lot of time, so did you try it in PySpark? – cph_sto Mar 06 '19 at 13:45
i is iteration counter and that's pseudo code. yes, I need exactly that for tables and in pyspark it took a long time to be done. – MHB Mar 06 '19 at 14:42
i is not a problem, the problem is to select that 4 tables. – MHB Mar 06 '19 at 14:44
Your pseudo-code looks fine though. – cph_sto Mar 06 '19 at 14:45
Check this - may be this solves your problem, though the logic used is quite similar, but with dictionary instead. https://stackoverflow.com/questions/54743574/creating-multiple-pyspark-dataframes-from-a-single-dataframe/ – cph_sto Mar 06 '19 at 14:47
Are you looking for `table.where(table["ID"].isin(idList))`? – pault Mar 06 '19 at 15:18

selecting multiple rows with a list of id pyspark

0 Answers0