0

I have the following list

In [13]:  nested_list=[0,25,[0,2,3,4],[1,1,-1,-1]]

and I'd like to flatten it as follows:

[0,25,0,2,3,4,1,1,-1,-1]

using the following list comprehension

[y for y in x if isinstance(x,list) else x for x in nested_list]

But I'm getting this error

 In [16]: [y for y in x if isinstance(x,list) else x for x in nested_list]
 File "<ipython-input-16-e49b6b9924a1>", line 1
[y for y in x if isinstance(x,list) else x for x in nested_list]
                                       ^
 SyntaxError: invalid syntax

I know there are multiple solutions not using a list comprehension but recursion. However, I'd like to use a list comprehension. Can someone advice as to the correct syntax ?

femibyte
  • 3,317
  • 7
  • 34
  • 59

3 Answers3

3

One way using a list comprehension:

[y for z in [x if isinstance(x, list) else [x] for x in nested_list] for y in z]
#[0, 25, 0, 2, 3, 4, 1, 1, -1, -1]

Update

Even simpler:

[y for x in nested_list for y in (x if isinstance(x,list) else [x])]
#[0, 25, 0, 2, 3, 4, 1, 1, -1, -1]
pault
  • 41,343
  • 15
  • 107
  • 149
  • you could turn the inner list comp into a generator which may improve performance, depending on how big your `nested_list` is – pault Sep 05 '19 at 15:41
  • It's small, and I'd like to use it within a Pyspark udf. – femibyte Sep 05 '19 at 15:42
  • If it's for `pyspark`, version 2.4 has a [`flatten`](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.flatten) function which will be [better than using a `udf`](https://stackoverflow.com/questions/38296609/spark-functions-vs-udf-performance). In any case, you don't need a list comprehension for a `udf`, you can use other methods (like the easier to read simple loop). – pault Sep 05 '19 at 15:44
  • It's for a vector type field, and using flatten is giving me this error `org.apache.spark.sql.AnalysisException: cannot resolve ;flatten(features) due to data type mismatch:` – femibyte Sep 05 '19 at 15:55
  • Oh if it's `VectorUDT` you will have to use `udf`. – pault Sep 05 '19 at 15:57
2

Limited to a list x composed of list and int, it can be done via

x = [0,25,[0,2,3,4],[1,1,-1,-1]]
res = []
for i in x:
    if type(i) == int:
        res.append(i)
    else:
        res += i 
print(res) 

Output

[0, 25, 0, 2, 3, 4, 1, 1, -1, -1]

Write above in one line of code.

x = [0,25,[0,2,3,4],[1,1,-1,-1]]
sum([[i] if type(i) == int else i for i in x],[])
ComplicatedPhenomenon
  • 4,055
  • 2
  • 18
  • 45
0

Using list comprehension only, variant with type() instead of isinstance:

nested_list=[0,25,[0,2,3,4],[1,1,-1,-1]]

[i for sublist in [[x] if type(x) == int else x for x in nested_list] for i in sublist]
Nikaido
  • 4,443
  • 5
  • 30
  • 47