3

Sorry, very new to python.

Essentially I have a long list of file names, some in the format NAME_XX123456 and others in the format NAME_XX123456_123456.

I am needing to lose everything from the second underscore and after in each element. The below code only iterates through the first two elements though, and doesn't delete the remainder when it encounters a double underscore, just splits it.

sample_list=['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']

shortlist=[]
item  = "_"
count = 0
i=0
for i in range(0,len(sample_list)):
        if(item in sample_list[i]):
               count =  count + 1
               if(count == 2):
                     shortlist.append(sample_list[i].rpartition("_"))
                     i+=1
                     
               if (count == 1):
                   shortlist.append(sample_list[i])
                   i+=1
                   
               
        print(shortlist)
HoopStart
  • 49
  • 4
  • Does this answer your question? [Delete rest of string after n-th occurence](https://stackoverflow.com/questions/35109927/delete-rest-of-string-after-n-th-occurence) – Abhyuday Vaish May 13 '22 at 06:30

2 Answers2

4

Here is a simple split join approach. We can split each input on underscore, and then join the first two elements together using underscore as the separator.

sample_list = ['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
output = ['_'.join(x.split('_')[0:2]) for x in sample_list]
print(output)
# ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070', 'NAME_XX090119']

You could also use regular expressions here:

sample_list = ['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
output = [re.sub(r'([^_]+_[^_]+)_.*', r'\1', x) for x in sample_list]
print(output)
# ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070', 'NAME_XX090119']
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

You can simply use split method to split each item in the list using '_' and then join the first two parts of the split. Thus ignoring everything after the second underscore. Try this:

res= []
for item in sample_list:
    item_split = item.split('_')
    res.append('_'.join(item_split[0:2])) # taking only the first two items

print(res) # ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070','NAME_XX090119']
devrraj
  • 312
  • 2
  • 7