-1

I have a list of websites, and I want to extract social media profiles only (let's say facebook, linkedin and pinterest)

import numpy as np

mylist = ['linkedin.com/profilexyz','facebook.com/profile374','bbcnews.com/USA_news','stackoverflow.com']

I have used list comprehension to get the urls, returning nan if it's not found:

facebook = [x for x in mylist if 'facebook' in x else np.nan for x in mylist]
linkedin = [x for x in mylist if 'linkedin' in x else np.nan for x in mylist]
pinterest = [x for x in mylist if 'pinterest' in x else np.nan for x in mylist]

However I get the error:

File "<ipython-input-329-578130619ae7>", line 1
facebook = [x for x in mylist if 'facebook' in x else np.nan for x in mylist]
                                                        ^
SyntaxError: invalid syntax

I have checked the suggested duplicates such as this one: if/else in a list comprehension? but can't get my comprehension to work.

SCool
  • 3,104
  • 4
  • 21
  • 49

4 Answers4

1

you have the order messed up and an extra for

fb = [x if 'facebook' in x else np.nan for x in mylist]
Derek Eden
  • 4,403
  • 3
  • 18
  • 31
  • Thank you. Is there any way to just have the successful result in the list? Like `[facebook.com/profile374]` instead of `[nan, 'facebook.com/profile374', nan, nan]`. For pinterest this is not found so the result is also `['nan,nan,nan,nan]`. How could I make a single `nan` if nothing is found? – SCool Aug 26 '19 at 11:25
  • @SCool `fb = [i for i in mylist if 'facebook' in i] or [np.nan]` (if you want a list with a single entry if it isn't in the list) – tomjn Aug 26 '19 at 11:36
  • see Rakesh's answer for this..basically just dropping the else – Derek Eden Aug 26 '19 at 12:22
0

This is one approach using collections.defaultdict and str.split

Ex:

from collections import defaultdict

result = defaultdict(list)
mylist = ['linkedin.com/profilexyz','facebook.com/profile374','bbcnews.com/USA_news','stackoverflow.com']

for url in mylist:
    result[url.split('/')[0]].append(url)
print(result)

Output:

defaultdict(<class 'list'>, {'linkedin.com': ['linkedin.com/profilexyz'], 'facebook.com': ['facebook.com/profile374'], 'bbcnews.com': ['bbcnews.com/USA_news'], 'stackoverflow.com': ['stackoverflow.com']})

FYI in your method

facebook = [x for x in mylist if 'facebook' in x]
print(facebook)
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • your FYI only returns the results that contain the url name..the OP wants np.nan where there aren't matches, i.e. result list same size as original – Derek Eden Aug 26 '19 at 11:16
  • 1
    @DerekEden. Ok ..I do not see that condition in OP's request – Rakesh Aug 26 '19 at 11:19
  • I guess it didn't specifically stipulate the condition, but based on their attempt it looked like that's what they wanted – Derek Eden Aug 26 '19 at 12:21
0

Remove extra for loop from list comprehension.

Ex.

import numpy as np

mylist = ['linkedin.com/profilexyz','facebook.com/profile374','bbcnews.com/USA_news','stackoverflow.com']
facebook = [x if 'facebook' in x else np.nan for x in mylist]
linkedin = [x if 'linkedin' in x else np.nan for x in mylist]
pinterest = [x if 'pinterest' in x else np.nan for x in mylist]
print(facebook)
print(linkedin)
print(pinterest)

O/P:

[nan, 'facebook.com/profile374', nan, nan]
['linkedin.com/profilexyz', nan, nan, nan]
[nan, nan, nan, nan]

Remove duplicate nan value from list using set()

print(list(set(facebook)))
print(list(set(linkedin)))
print(list(set(pinterest)))

O/P:

[nan, 'facebook.com/profile374']
[nan, 'linkedin.com/profilexyz']
[nan]
bharatk
  • 4,202
  • 5
  • 16
  • 30
  • ok, thanks. Is there a way to just keep the values we find, rather than repeating `nan, nan, nan` ? I tried already to just have a single `nan` if not found, but it doesn't work: `[x if 'facebook' in x else np.nan]` – SCool Aug 26 '19 at 11:14
  • @SCool Yes you can, remove else condition and re-write list comprehension like this eg. `[x for x in mylist if 'facebook' in x ]` . – bharatk Aug 26 '19 at 11:17
  • @SCool You want to keep single `nan` value in the list not duplicate `nan` value. `list comprehension` does not support to check where the list contains element or not. In that case, you should try normal `for loop` and do something that you want. – bharatk Aug 26 '19 at 11:26
  • @SCool Updated answer, how to remove a duplicate element from the list. – bharatk Aug 26 '19 at 11:38
0

Just use parentheses:

facebook = [(x for x in mylist) if 'facebook' in x else np.nan for x in mylist]
linkedin = [(x for x in mylist) if 'linkedin' in x else np.nan for x in mylist]
pinterest = [(x for x in mylist) if 'pinterest' in x else np.nan for x in mylist]
Kostas Charitidis
  • 2,991
  • 1
  • 12
  • 23