-1

I have been looking through numerous questions that seem like they hit the nail on the head, but then end up confusing me further and end up not helping at all. So hopefully no one closes this question and refers me to other questions, and actually helps me because I have spent hours trying to figure it out. I cannot provide the actual text for security reasons so I will make up similar looking lists. There are thousands of strings in these list but ill just make an example of 3, purposely putting in strings that I want to match up.

list= ['93900 2016-01-11.50 10.17', '93030 2014-04-16.50 18.83', '29322 2009-05-21.50 17.81']

list1= ['33492 2017-02-14.50 11.17', '93900 2016-02-11.00 11.15', '93900 2016-12-14.00 15.66']

  1. list has different spacing between the characters
  2. I need to take for example in "list", 93900 2016-01-11.50 10.17 and compare to the strings in list1, and ask if 93900 along with the date 2016-01-11.50 but with a +-month buffer. So ideally it would return '93900 2016-02-11.00 11.15', '93900 2015-12-14.00 15.66' from list1. I only know how to compare exact strings that are either exactly the same or not. This is more complicated because if I do that comparison it will clearly return an empty list because none of them will match. I need a smarter code that will look within the string and allow me to look for values near it. I also need to put the full string into a new list after compared, not the partial string.

I hope this makes sense and that someone can help.

All I have is a nested loop that does not work because I cannot figure out how to compare partial strings.

new_list= [] for line in list: for line1 in list1: if line[0:5] in line1[0:5] new_list.append[line]

Yeh this clearly does not work but its a way to check one agains each element in the list, but not certain characters.

  • 1
    You haven't posted the code that you need help with. – Scott Hunter Jun 06 '20 at 22:04
  • no one has to complete your job instead of you. you gotta show your attempts – Valentyn Anzhurov Jun 06 '20 at 22:04
  • `2016-1 - 1` month it's `2015-12` while in your example you picked `2016-12`, is that correct or a typo? – Hoxha Alban Jun 06 '20 at 22:09
  • @ValentynAnzhurov I didn't ask for anyone to complete my code, I asked for help on the logic essesntially. But I will add whatever I can. – deadoralive Jun 06 '20 at 22:17
  • Your example shows you can get 2 entries matching in your case.. anyway, if its just checking for strings then try this `if any((_ for _ in list1 if _.startswith('93900 2016-'))):` – Sam Daniel Jun 06 '20 at 22:18
  • @HoxhaAlban Yes thats a typo thank you – deadoralive Jun 06 '20 at 22:19
  • @SamDaniel It is checking for the string but there may be 10 different months that can be included that I don't want if just check for the year like you suggested. Also there are thousands of names, in which I don't know all of them. So I can't individually check so I need a generic code. Does that make sense? Thanks for for your suggestions so far – deadoralive Jun 06 '20 at 22:27
  • the buffer is always 1 month? – Hoxha Alban Jun 06 '20 at 22:35
  • @HoxhaAlban Yes just a month before the date and after the date. The day doesn't have match, just as long as the month is 1 before or after. – deadoralive Jun 06 '20 at 22:38
  • Have you considered not comparing strings but splitting the strings, resulting in a list of tuples and then working on those items instead with the correct datatypes? Your example data looks quite well-formed and should be parsable. – Joma Jun 06 '20 at 22:40
  • @joma Yes I thought about that and actually started doing that, but then I realized I need to keep track of the original string. I need to make a list of all of them in tact. So I am not sure how it could work. Thanks for that suggestion. – deadoralive Jun 06 '20 at 23:17
  • @deadoralive you could also add the string as a tuple element: `('93000 2020-06-07', 93000, datetime.date(2020, 6, 7))` – Joma Jun 06 '20 at 23:30

2 Answers2

0

If the buffer is always 1 month and the data format is the same this code should work for you:

def comp(s, l): # string to search, list
    head, month = s.split('-')[0:2] # eg: with s = '93900 2016-01-11.50 10.17' head = '93900 2016' and month = '01'
    head, year = head.split(' ') # head = '93900' year = '2016'
    year = int(year)
    month = int(month)

    # managing edge cases where month is january or december
    if month == 1: 
        y1 = year - 1 
        m1 = 12
    else:
        y1 = year
        m1 = month - 1

    if month == 12:
        y2 = year + 1
        m2 = 1
    else:
        y2 = year
        m2 = month + 1

    # building strings to search for
    s1 = head + ' ' + str(y1) + '-' + str(m1).zfill(2)
    s2 = head + ' ' + str(y2) + '-' + str(m2).zfill(2)

    out = []
    for item in l:
        if s1 in item or s2 in item:
            out.append(item)

    return out

test_s = '93900 2016-01-11.50 10.17'
test_l = ['33492 2017-02-14.50 11.17', '93900 2016-02-11.00 11.15', '93900 2015-12-14.00 15.66']

print(comp(test_s, test_l))
Hoxha Alban
  • 1,042
  • 1
  • 8
  • 12
  • I am going to try this thank you. This is actually what I was picturing in my mind. Some if else statements with the +-1 buffer. Ill get back to you if it works. Thank you. – deadoralive Jun 06 '20 at 23:19
  • I got the error: AttributeError: 'list' object has no attribute 'split' when I tried to run this. What do you think is happening? – deadoralive Jun 07 '20 at 00:54
  • because the function i've created accept 1 string, so you need to loop over your first list like: `for elem in l1: print(comp(elem, l2))`. where `l1` is the first list and `l2` the second – Hoxha Alban Jun 07 '20 at 10:09
  • btw if you are not following any python tutorial i advise you to study from the official docs, they are really helpful: https://docs.python.org/3/tutorial/index.html – Hoxha Alban Jun 07 '20 at 10:12
0

You need to extract the date part convert them to date type then you can do date comparisons.

Well as pointed out in the comments, timedelta can't compare months as it's not a uniform measure. Found another answer which uses a 3rd party library to compare months. If you use that you could piece together a logic like below.

Warning: psuedocode below

import datetime as dt

def extract_date(txt):
    return dt.datetime.strptime(txt.split()[1].split('.')[0])

for i in list0:
     id, date = extract_id(i), extract_date(i)
     filter = [j for j in list1 if j.startwith(id) and (date - extract_date(j)).month <= 1]