How to animate a 2D scatter plot given X, Y coordinates and time with appearing and disappearing points?

Question

I have a data frame like the below:

Every row represents a person. They stay at 3 different locations for some time given on the dataframe. The first few people don't stay at location1 but they "born" at location2. The rest of them stay at every locations (3 locations).

I would like to animate every person at the given X, Y coordinates given on the data frame and represent them as dots or any other shape. Here is the flow:

Every person should appear at the first given location (location1) at the given time. Their color should be blue at this state.
Stay at location1 until location2_time and then appear at location2. Their color should be red at this state.
Stay at location2 until location3_time and then appear at location3. Their color should be red at this state.
Stay at location3 for 3 seconds and disappear forever.

There can be several people on the visual at the same time. How can I do that?

There are some good answers on the below links. However, on these solutions, points don't disappear.

Another option is to use Vaex, https://vaex.io/docs/index.html — David, Apr 17 '21 at 21:09
Time is in what unit? Milliseconds? Also, for those first few people, when are they born at location 2? Do you want it to start off showing them as red dots until they hit location 3? If that's the case then location 2 time for those people is technically 0. — Gabe Morris, Apr 18 '21 at 17:35
@GabeMorris Yes sir. I want first few people shown to be red. Also, correct location 2 time is zero for them! The unit is in seconds. — RookieScientist, Apr 19 '21 at 05:36
I am working on an alternative solution that should be scaleable. Done in 1-2 hours. — Can H. Tartanoglu, Apr 21 '21 at 15:18

Can H. Tartanoglu · Accepted Answer · 2021-04-23T12:02:38.533

The following is an implementation with python-ffmpeg, pandas, matplotlib, and seaborn. You can find output video on my YouTube channel (link is unlisted).

Each frame with figures is saved directly to memory. New figures are generated only when the state of the population changes (person appears/moves/disappears).

You should definetely separate this code into smaller chunks if you are using this in a Python package:

from numpy.random import RandomState, SeedSequence
from numpy.random import MT19937
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import ffmpeg


RESOLUTION = (12.8, 7.2)        # * 100 pixels
NUMBER_OF_FRAMES = 900


class VideoWriter:
    # Courtesy of https://github.com/kylemcdonald/python-utils/blob/master/ffmpeg.py
    def __init__(
        self,
        filename,
        video_codec="libx265",
        fps=15,
        in_pix_fmt="rgb24",
        out_pix_fmt="yuv420p",
        input_args=None,
        output_args=None,
    ):
        self.filename = filename
        self.process = None
        self.input_args = {} if input_args is None else input_args
        self.output_args = {} if output_args is None else output_args
        self.input_args["r"] = self.input_args["framerate"] = fps
        self.input_args["pix_fmt"] = in_pix_fmt
        self.output_args["pix_fmt"] = out_pix_fmt
        self.output_args["vcodec"] = video_codec

    def add(self, frame):
        if self.process is None:
            height, width = frame.shape[:2]
            self.process = (
                ffmpeg.input(
                    "pipe:",
                    format="rawvideo",
                    s="{}x{}".format(width, height),
                    **self.input_args,
                )
                .filter("crop", "iw-mod(iw,2)", "ih-mod(ih,2)")
                .output(self.filename, **self.output_args)
                .global_args("-loglevel", "quiet")
                .overwrite_output()
                .run_async(pipe_stdin=True)
            )
        conv = frame.astype(np.uint8).tobytes()
        self.process.stdin.write(conv)

    def close(self):
        if self.process is None:
            return
        self.process.stdin.close()
        self.process.wait()


def figure_to_array(figure):
    """adapted from: https://stackoverflow.com/questions/21939658/"""
    figure.canvas.draw()
    buf = figure.canvas.tostring_rgb()
    n_cols, n_rows = figure.canvas.get_width_height()
    return np.frombuffer(buf, dtype=np.uint8).reshape(n_rows, n_cols, 3)


# Generate data for the figure
rs1 = RandomState(MT19937(SeedSequence(123456789)))

time_1 = np.round(rs1.rand(232) * NUMBER_OF_FRAMES).astype(np.int16)
time_2 = time_1 + np.round(rs1.rand(232) * (NUMBER_OF_FRAMES - time_1)).astype(np.int16)
time_3 = time_2 + np.round(rs1.rand(232) * (NUMBER_OF_FRAMES - time_2)).astype(np.int16)

loc_1_x, loc_1_y, loc_2_x, loc_2_y, loc_3_x, loc_3_y = np.round(rs1.rand(6, 232) * 100, 1)

df = pd.DataFrame({
    "loc_1_time": time_1,
    "loc_1_x": loc_1_x,
    "loc_1_y": loc_1_y,
    "loc_2_time": time_2,
    "loc_2_x": loc_2_x,
    "loc_2_y": loc_2_y,
    "loc_3_time": time_3,
    "loc_3_x": loc_3_x,
    "loc_3_y": loc_3_y,
})
"""The stack answer starts here"""
# Add extra column for disappear time
df["disappear_time"] = df["loc_3_time"] + 3

all_times = df[["loc_1_time", "loc_2_time", "loc_3_time", "disappear_time"]]
change_times = np.unique(all_times)

# Prepare ticks for plotting the figure across frames
x_values = df[["loc_1_x", "loc_2_x", "loc_3_x"]].values.flatten()
x_ticks = np.array(np.linspace(x_values.min(), x_values.max(), 6), dtype=np.uint8)

y_values = df[["loc_1_y", "loc_2_y", "loc_3_y"]].values.flatten()
y_ticks = np.array(np.round(np.linspace(y_values.min(), y_values.max(), 6)), dtype=np.uint8)

sns.set_theme(style="whitegrid")
video_writer = VideoWriter("endermen.mp4")
if 0 not in change_times:
    # Generate empty figure if no person arrive at t=0
    fig, ax = plt.subplots(figsize=RESOLUTION)
    ax.set_xticklabels(x_ticks)
    ax.set_yticklabels(y_ticks)
    ax.set_title("People movement. T=0")

    video_writer.add(figure_to_array(fig))

    loop_range = range(1, NUMBER_OF_FRAMES)
else:
    loop_range = range(NUMBER_OF_FRAMES)

palette = sns.color_palette("tab10")        # Returns three colors from the palette (we have three groups)
animation_data_df = pd.DataFrame(columns=["x", "y", "location", "index"])
for frame_idx in loop_range:
    if frame_idx in change_times:
        plt.close("all")
        # Get person who appears/moves/disappears
        indexes, loc_nums = np.where(all_times == frame_idx)
        loc_nums += 1

        for i, loc in zip(indexes, loc_nums):
            if loc != 4:
                x, y = df[[f"loc_{loc}_x", f"loc_{loc}_y"]].iloc[i]

            if loc == 1:            # location_1
                animation_data_df = animation_data_df.append(
                    {"x": x, "y": y, "location": loc, "index": i},
                    ignore_index=True
                )
            else:
                data_index = np.where(animation_data_df["index"] == i)[0][0]
                if loc in (2, 3):   # location_2 or 3
                    animation_data_df.loc[[data_index], :] = x, y, loc, i
                elif loc == 4:      # Disappear
                    animation_data_df.iloc[data_index] = np.nan

        current_palette_size = np.sum(~np.isnan(np.unique(animation_data_df["location"])))
        fig, ax = plt.subplots(figsize=RESOLUTION)
        sns.scatterplot(
            x="x", y="y", hue="location", data=animation_data_df, ax=ax, palette=palette[:current_palette_size]
        )

        ax.set_xticks(x_ticks)
        ax.set_xticklabels(x_ticks)
        ax.set_yticks(y_ticks)
        ax.set_yticklabels(y_ticks)
        ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

    ax.set_title(f"People movement. T={frame_idx}")
    video_writer.add(figure_to_array(fig))

video_writer.close()

Edit: There was a bug in which location_3 wasn't removed after 3 seconds. Fixed now.

Teh · Answer 2 · 2021-04-18T00:43:40.850

Modifying the code from this question to only include the positions you want automatically removes the old ones if the old position isn't included in the new ones. This doesn't change if you want to animate by time or iterations or anything else. I have opted to use iterations here since it's easier and I don't know how you are handling your dataset. The code does have one bug though, the last point (or points if they last the same amount of time) remaining won't disappear, this can be solved easily if you don't want to draw anything again, if you do though for exaple in case you there is a gap in the data with no people and then the data resumes I haven't found any workarounds

import math
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

#The t0,t1,t2,t3 are the times (in iterations) that the position changes 
#If t0 is None then the person will never be displayed
people = [
    # t0          x1              y1             t1    x2   y2    t2   x3    y3    t4
    [ 0,          1,             0.1,             1,   2,   0.2,   2,   3,  0.3,   3],
    [ 2,          None,         None,          None,   2,   1,     3,   4,    1,   7],
    [ 2,  float("NaN"), float("NaN"),  float("NaN"),   2,   0.8,   4,   4,  0.8,   10],
]

fig = plt.figure()
plt.xlim(0, 5)
plt.ylim(0, 1)
graph = plt.scatter([], [])


def animate(i):
    points = []
    colors = []
    for person in people:
        if person[0] is None or math.isnan(person[0]) or i < person[0]:
            continue
        # Position 1
        elif person[3] is not None and not (math.isnan(person[3])) and i <= person[3]:
            new_point = [person[1], person[2]]
            color = "b"
        # Position 2
        elif person[6] is not None and not (math.isnan(person[6])) and i <= person[6]:
            new_point = [person[4], person[5]]
            color = "r"
        # Position 3
        elif person[9] is not None and not (math.isnan(person[9])) and i <= person[9]:
            new_point = [person[7], person[8]]
            color = "r"
        else:
            people.remove(person)
            new_point = []

        if new_point != []:
            points.append(new_point)
            colors.append(color)

    if points != []:
        graph.set_offsets(points)
        graph.set_facecolors(colors)
    else:
        # You can use graph.remove() to fix the last point not disappiring but you won't be able to plot anything after that
        # graph.remove()
        pass

    return graph


ani = FuncAnimation(fig, animate, repeat=False, interval=500)
plt.show()

Thanks for the solution. As @Bur states, this solution doesn't work with NaN values. Actually, I am looking for something scalable and applicable to any other dataframe. — RookieScientist, Apr 15 '21 at 07:30
I have added code to account for both None and NaN. "applicable to any other dataframe" I don't get what exactly you mean here the dataframe you have is pretty specific — Teh, Apr 15 '21 at 21:25
If I have 100K or 1M rows your solution will take so much time. — RookieScientist, Apr 17 '21 at 08:40
Why would you want to animate 1M rows? The only optimization I can thing is to delete rows which have already completed their animation, I have added code to do that. — Teh, Apr 18 '21 at 00:46
I might be biased (I am also answering the problem), but I do agree with the OP that applications like these need to have scalability in mind. — Can H. Tartanoglu, Apr 21 '21 at 23:49
I had a look at your solution and yes it does look like more robust and scalable, but I will keep this answer up in case someones wants something simpler. — Teh, Apr 22 '21 at 01:14

How to animate a 2D scatter plot given X, Y coordinates and time with appearing and disappearing points?

2 Answers2