0

I am working on analyzing financial donations made to an NGO and how their social media engagement has impacted these donations. To this end, I wanted to group together donations made to the organizations with the dates on which social media posts were made. The dates of donations are stored in a dataframe under a 'Transaction Date' header, and the dates on which social media posts have been made have been scraped from Facebook and put in an array. The datatype for donation dates is datetime64[ns] and the datatype for social media post dates is datetime.date.

This is my code, roughly reproduced. Any help with regards to what changes I could make?

donation_timestamp = pd.DataFrame()
donation_timestamp['Dates'] = np.array(['2019-05-01', '2019-05-12', '2019-05-23'])
donation_timestamp['Dates'] = pd.to_datetime(donation_timestamp['Dates'])

post_dates = pd.to_datetime(scraped_dates)
post_timestamps = []
for i in post_dates:
    time = dt.datetime.strptime(str(i), "%Y-%m-%d %H:%M:%S").date()
    post_timestamps.append(time)
post_timestamps = np.array(post_timestamps)

post_donations = dict.fromkeys(post_timestamps)
for i in post_timestamps:
    for j in donation_timestamp:
        if j-i < dt.timedelta(days=2):
            post_donations[i] = np.append(post_donations[i], j)

I created a new dictionary with the social media posts dates as keys, and tried to iterate over both these arrays. Wherever a donation has been made within two days of a social media post, I have tried to classify that donation date under the respective post date in my dictionary. I ran into problems with this loop logic, and my condition statement. For some reason, my pd.timedelta definition to check for the difference between two dates is not working - all iterations are satisfying the condition. Also I don't understand how to convert the datatypes so that I can seamlessly take the difference between the two dates.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 1
    It would be useful if you made the code example reproducible by providing a sample array of `scraped_dates`. – user19077881 Jun 04 '23 at 07:26
  • You might want to use [`merge_asof`](https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html), but for specifics, we'd need to see a reproducible example including all inputs (i.e. `scraped_dates`) and desired output. See [How to make good reproducible pandas examples](/q/20109391/4518341), and [mre] in general. You can also minimize your code, for example, `donation_timestamp = pd.DataFrame({'Date': pd.to_datetime(['2019-05-01', ...]))` and `post_timestamps = pd.to_datetime(scraped_dates).dt.normalize()`. Just to be clear, `.strptime(str(i)` is totally redundant. – wjandrea Jun 04 '23 at 18:33

1 Answers1

-1

Using pandas series to store the dates of the donations and media posts we can output a list of donations that are within two days after a media post in the following way:

dons = pd.Series(np.array(['2019-05-10', '2019-05-12', '2019-05-21']))
dons = pd.to_datetime(dons)
dons.sort_values(ascending=True, inplace=True)

scraped_data = ["2019-05-11 13:00:00", "2019-05-19 15:00:00"]
posts = pd.Series(scraped_data)
posts = pd.to_datetime(posts)
posts.sort_values(ascending=True, inplace=True)

p = 0
d = 0
promotion_donos = []
while p < len(posts) and d < len(dons):
    if 2 >= (dons[d] - posts[p]).days >= 0:
        promotion_donos.append(dons[d])
        d += 1  # go to next donation
    elif (dons[d] - posts[p]).days > 2:
        p += 1  # go to next media post
    else:  # post later than donation
        d += 1  # go to next donation

print(promotion_donos)
pmxpp88
  • 14
  • 1
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 05 '23 at 07:19