0

is there a faster, more efficient way to split rows in a list. My current setup isn't slow but does take longer than I think to split the whole list, maybe due to how many iterations it is required to go through the whole list.

I currently have the code below

found_reader = pd.read_csv(file, delimiter='\n', engine='c')
loaded_list = found_reader    
for i in range(len(loaded_list)):
            loaded_email_list = loaded_email_list + [loaded_list[i].split(':')[0]]

I just would like a method to do the above in the quickest but efficient time

RajB_007
  • 55
  • 8

1 Answers1

2

Here's how you do that efficiently if both loaded_list and loaded_email_list were regular lists (it may need slight adaptation for whatever it is that Pandas uses):

loaded_email_list += [x.partition(':')[0] for x in loaded_list]

Why this is better:

  • It iterates over the list directly, instead of using range, len, and an index variable
  • It uses partition, which stops looking after the first :, instead of split, which walks the whole string
  • It uses a list comprehension to create the new list all at once, rather than creating and concatenating a bunch of single-element lists
  • It uses x += y, instead of x = x + y, which could theoretically be faster if its __iadd__ is more efficient than assigning its __add__ result back to itself.
  • Much appreciated. Funny enough I sort of solved it with this - loaded_email_list = [row.split(':', 1)[0] for row in loaded_list]. But I read what you said about using partition over split and it makes sense. – RajB_007 Jun 08 '19 at 23:57