1

I have a list json_data:

> print(json_data)
> ['abc', 'bcd/chg', 'sdf', 'bvd', 'wer/ewe', 'sbc & osc']

I need to split those elements with '/', '&' or 'and' into two different elements. The result I am looking for should look like this:

>['abc', 'bcd', 'chg', 'sdf', 'bvd', 'wer', 'ewe', 'sbc' , 'osc']

The code is:

separators = ['/', 'and', '&']

titles = []
for i in json_data:
    titles.extend([t.strip() for t in i.split(separators)
                  if i.strip() != ''])

When running it, I am getting an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-d0db85078f05> in <module>()
      5 titles = []
      6 for i in json_data:
----> 7     titles.extend([t.strip() for t in i.split(separators)
      8                   if i.strip() != ''])

TypeError: Can't convert 'list' object to str implicitly

How can this be fixed?

Feyzi Bagirov
  • 1,292
  • 4
  • 28
  • 46

4 Answers4

2

I believe what you are looking for in your list comprehension is

[t.strip() for separator in separators for t in i.split(separator) if i.strip() != '']

Python does not have automatic seperation by a list of delimeters.

Uriel
  • 15,579
  • 6
  • 25
  • 46
  • That worked, however, most of the elements copied itself twice, so my list is three times as big (174116 vs 51212 elements). Why did that happen? – Feyzi Bagirov Jun 09 '17 at 04:24
  • @FeyziBagirov convert this list to set, then back to list. This will remove duplicates – Uriel Jun 09 '17 at 04:26
  • `i` does not change in the course of the list comp, so `i.strip()` should be extracted from the comp – Jon Kiparsky Jun 09 '17 at 04:27
  • @Uriel better to not create duplicates than to dedupe. – Jon Kiparsky Jun 09 '17 at 04:28
  • @FeyziBagirov review https://stackoverflow.com/questions/1059559/split-strings-with-multiple-delimiters – Uriel Jun 09 '17 at 04:31
  • @FeyziBagirov The problem is that this loops over `separators` and splits once for each of those, and appends each of the resulting lists to the resulting list, thus the list is approximately tripled in length. To fix it, split on the regex, see my answer. – Jon Kiparsky Jun 09 '17 at 04:31
  • @Uriel Python does have a robust regex library which does in fact do separation by a list of delimiters. – Jon Kiparsky Jun 09 '17 at 04:32
2

The problem occurs in i.split(separators) where the call to split is expecting a string to split i by, but gets a list of strings. You could try having another for loop, iterating over your separators, and split i by that.

Edit: You're better off viewing @Uriel's answer, it is the more Pythonic way!

Priyank
  • 1,513
  • 1
  • 18
  • 36
1

Regex is your friend:

>>> import re
>>> pat = re.compile("[/&]|and")
>>> json_data = ['abc', 'bcd/chg', 'sdf', 'bvd', 'wer/ewe', 'sbc & osc']
>>> titles = []
>>> for i in json_data:
...   titles.extend([x.strip() for x in pat.split(i)])
... 
>>> titles
['abc', 'bcd', 'chg', 'sdf', 'bvd', 'wer', 'ewe', 'sbc', 'osc']

This line noise: re.compile("[/&]|and") means "create a regular expression matching either [/&] or the word 'and'". [/&] of course matches either / or &. Having that in hand, pat.split(i) just splits the string i on anything matching pat.

LATE EDIT: Realized that of course we can skip the strip() step by complicating the regex a little. If we have the regex "\s[/&]\s|\sand\s" then of course we match any whitespace before or after the basic matched elements. This means that splitting on this pattern removes the excess whitespace, and in addition it prevents us from splitting in the middle of a word like "sandwich", should that happen to appear in our data:

>>> pat = re.compile("\s[/&]\s|\sand\s")
>>> pat.split("beans and rice and sandwiches")
['beans', 'rice', 'sandwiches']
>>> 

This simplifies the construction of the list, since we no longer need to strip the whitespace from the results of the split, which incidentally saves us some looping. Given the new pattern, we can write it this way:

>>> titles = []
>>> for i in json_data:
...   titles.extend(pat.split(i))
... 
Jon Kiparsky
  • 7,499
  • 2
  • 23
  • 38
0
json_data = ["abc", "bcd/chg", "sdf", "bvd", "wer/ewe", "sbc & osc"]
separators = ['/', '&', 'and']
title = []

for i in json_data:
    k = 0
    while k < len(separators):
        if separators[k] in i:
            t = i.split(separators[k])
            title.extend(t)
            break
        else:
            k += 1
        if k == 3:
            title.append(i)
print(title)
nitin_cherian
  • 6,405
  • 21
  • 76
  • 127
  • @kuro thanks for pointing that out. Posted from the phone, indentation got mixed up. All fixed now. – Vape Waves Jun 09 '17 at 06:23
  • @nitin_cherian thank you for all the additional info. I will surely look into it. Only 5 months into Python, clearly, I have still a lot to learn. – Vape Waves Jun 09 '17 at 06:58
  • @VapeWaves: Correct. That is the reason I gave you the links inorder to help you in learning and did not put my code here. – nitin_cherian Jun 09 '17 at 07:01
  • @nitin_cherian error 404 on both the links that you have provided. – Vape Waves Jun 09 '17 at 07:02
  • You did a fine job by using the for-in clause on the json_data. I would recommend using the same clause on the loop of separators instead of the while loop. Also, you may want to learn the for-else clause in Python to be applied on the second loop[book.pythontips.com/en/latest/for_-_else.html]. Finally you may want to use list comprehension to strip any of the white spaces in the elements of the final list. The code snippet is here [http://gist.github.com/nitin-cherian/a62f8c79f7e67bf0c24d0a4eba757d15 ], for your reference. – nitin_cherian Jun 09 '17 at 07:06
  • @VapeWaves: I have reposted the comment. Just copy paste the links. Do not paste the leading and trailing square brackets. – nitin_cherian Jun 09 '17 at 07:08
  • 1
    @nitin_cherian thank you for introducing me to to the for-else clause, should save me a lot of unnecessary lines of code. – Vape Waves Jun 09 '17 at 07:28