I am trying to populate a dataframe by looking up values in a list of lists and trying to find a match for column/index. I found this post and thought I could modify it for my needs. fill in entire dataframe cell by cell based on index AND column names?. He fills in his pandas dataframe using the edit_distance function. I am currently trying to modify that function so it outputs actual data.
My data set looks something like this but with many more values:
Data = [Product Number, Date, Quantity]
[X1 , 2018-01, 2]
[X1, , 2018-02, 4]
[X1, , 2018-03, 7]
[X2, , 2018-01, 3]
[X3, , 2018-02, 5]
[X3, , 2018-03, 6]
Expected Outcome: Apologies for crude representation
DF = 2018-01 2018-02- 2018-03
X1 2 4 7
X2 3
X3 5 6
I deduped all of the product numbers and dates in my list of lists and set them equal to the below, just as he did in the referenced stack question.
series_rows = pd.Series(prod_deduped)
series_cols = pd.Series(dates_deduped)
His code for mapping all of the cells:
df = pd.DataFrame(series_rows.apply(lambda x: series_cols.apply(lambda y: edit_distance(x, y))))
The part starting with edit_distance is a function that returns a value based on the inputs for x,y. I created my own function that would loop through a list of lists and return a value based on a match.
def return_value(s1, s2, list_of_lists, starting_point_in_case_of_header):
for row in list_of_lists[starting_point_in_case_of_header:]:
result = ''
product = row[0]
date = row[1]
quantity = row[2]
#for prod in product:
if product == s1 and date == s2:
result = quantity
return result
I get a match on row1, column1 but everything else is blank which makes me think that I really need to be looping through s1 or s2 before everything else. Any help would be appreciated. Thanks!
EDIT: Here is my most recent attempt at trying to loop through s1 and s2 but this just errors out saying my list index is out of range. I think I am on the right track though.
def return_value(s1, s2, list_of_lists, starting_point_in_case_of_header):
for y in enumerate(s2):
result_ = []
for x in enumerate(s1):
for row in list_of_lists[starting_point_in_case_of_header:]:
product = row[0]
date = row[1]
quantity = row[2]
if product == x and date == y:
result_.append(quantity)
result = result_
return result[-1]
My final code to put everything all together:
result_df = pd.DataFrame(series_rows.apply(lambda x: series_cols.apply(lambda y: return_value(x, y, sorted_deduped_list, 0))))