0

I'm trying to merge two dataframes (one from my database and one collected from an API). The dataframes are collected fine and have the correct data types, but whenever I try to use pd.merge(df1, df2, on_left, on_right), I get an error

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison

and an empty dataframe is returned

Every where I look, the error seems to be related to a type error (e.g., FutureWarning: elementwise comparison failed; returning scalar, but in the future will perform elementwise comparison).

However, I'm almost certain my data types are correct.

Strangely, this error seems to depend on my SQL query. Changing user ids seems to make it work even though there are no changes in the data types or outputs.

sql = """
        SELECT
        meeting_id, doccano_project_id, document_id, date_added,
        user_provided_date_time, speaker_id, start_time_in_s, stop_time_in_s,
        pauses, number_of_words
        FROM
        utterance JOIN meeting ON utterance.meeting_id = meeting.id
        AND
        meeting_id IN (SELECT DISTINCT
                        meeting_id
                        FROM
                        utterance
                        WHERE
                        speaker_id = %s)
        """
utt_df = pd.read_sql(sql, conn, params=(user_id,))


print(utt_df['document_id'].head())
print(docs_df['id'].head())
combined_df = pd.merge(utt_df, docs_df, left_on='document_id', 
    right_on='id')


0    1591
1    1592
2    1593
3    1594
4    1595
Name: document_id, dtype: int64

0    1635
1    1634
2    1633
3    1632
4    1631
Name: id, dtype: int64

print(set(utt_df['document_id']) == set(docs_df['id']))
# True

The code works as intended for some sql queries but not for others even though there are no differences in the datatypes or any observable aspect of the dataframe. I'd like it to work for other sql queries as well.

Chu_bot
  • 65
  • 13

1 Answers1

1

I was able to figure out a solution but not the cause. It turned out that my query params (i.e., user_id) were changing from int to str, when I changed how the query. I haven't been able to observe any changes to the output dataframes, but casting the str to int fixed the problem.

Chu_bot
  • 65
  • 13