1

I am trying to populate a dataframe by looking up values in a list of lists and trying to find a match for column/index. I found this post and thought I could modify it for my needs. fill in entire dataframe cell by cell based on index AND column names?. He fills in his pandas dataframe using the edit_distance function. I am currently trying to modify that function so it outputs actual data.

My data set looks something like this but with many more values:

Data = [Product Number, Date, Quantity]
       [X1          , 2018-01, 2]
       [X1,         , 2018-02, 4]
       [X1,         , 2018-03, 7]
       [X2,         , 2018-01, 3]
       [X3,         , 2018-02, 5]
       [X3,         , 2018-03, 6]

Expected Outcome: Apologies for crude representation

DF = 2018-01 2018-02- 2018-03
  X1  2         4        7
  X2  3             
  X3            5        6

I deduped all of the product numbers and dates in my list of lists and set them equal to the below, just as he did in the referenced stack question.

series_rows = pd.Series(prod_deduped)
series_cols = pd.Series(dates_deduped)

His code for mapping all of the cells:

df = pd.DataFrame(series_rows.apply(lambda x: series_cols.apply(lambda y: edit_distance(x, y))))

The part starting with edit_distance is a function that returns a value based on the inputs for x,y. I created my own function that would loop through a list of lists and return a value based on a match.

def return_value(s1, s2, list_of_lists, starting_point_in_case_of_header):
    for row in list_of_lists[starting_point_in_case_of_header:]:
        result = ''
        product = row[0]
        date = row[1]
        quantity = row[2]
        #for prod in product:
        if product == s1 and date == s2:
            result = quantity
        return result

I get a match on row1, column1 but everything else is blank which makes me think that I really need to be looping through s1 or s2 before everything else. Any help would be appreciated. Thanks!

EDIT: Here is my most recent attempt at trying to loop through s1 and s2 but this just errors out saying my list index is out of range. I think I am on the right track though.

def return_value(s1, s2, list_of_lists, starting_point_in_case_of_header):
    for y in enumerate(s2):
        result_ = []
        for x in enumerate(s1):
            for row in list_of_lists[starting_point_in_case_of_header:]:
                product = row[0]
                date = row[1]
                quantity = row[2]
                if product == x and date == y:
                    result_.append(quantity)
        result = result_
    return result[-1]

My final code to put everything all together:

result_df = pd.DataFrame(series_rows.apply(lambda x: series_cols.apply(lambda y: return_value(x, y, sorted_deduped_list, 0))))

0 Answers0