The fact that some IP addresses are hotter - getting more reads or writes - than others - is usually not a big problem, and is pretty usual. Scylla will divide them randomly between the different nodes (and cores on each node), and as long as you have many significantly more hot partitions than you have cores in your cluster, the load - and disk usage - should be fairly well balanced.
Things can become different in extreme cases, such as when each update grow a partition (i.e., add a row to it), and only a few partitions are extremely hot. For example, you can imagine a database used to log requests, and in addition to a million normal clients with 10 requests a day, it also has 10 "attackers" who make a million requests a day. In such extreme cases you can find yourself with some of the nodes carrying significantly more of the load and/or disk space than others. Such extreme cases can also cause other problems: While Scylla's support for huge partitions has improved recently, it is still not perfect, and if you can avoid such extreme cases, it's better.
Finally, if I go back to your original question, "Is using IP address as primary key a good practice in scylla db?", the answer is "yes, but":
It's "yes" because Scylla has no specific problem with IP addresses as a key - it distributes the different IP address to different nodes randomly (using the "murmur3" hash function) so there is no particular problem with the fact that IP addresses clump up together (e.g., multiple clients from the same subnet don't just get sent to the same cluster nodes).
It's "but" because the problem isn't the IP addresses as a key per se, but rather the content of the partition you intend to store for it, and how skewed are the update frequency - and size - for the different partitions.
Oh, and one last note:
If you're using Size Tierd Compaction Strategy (STCS), the maximum disk-space usage at any particular moment can be quite higher than the actual amount of data being stored. If your workload is high in overwrites (data isn't being added, but rather replaced, deleted, etc.), before compaction finishes its work the data on disk can very well be twice the real amount of data. If this is the case, if you inspect the system at some random time, you will notice that some nodes have more data on disk than others, depending on their random position in the compaction work when you do this measurement. Something you can do to verify if this is what you're seeing is to invoke "major compaction" on all nodes, and measure the disk usage then - expecting to see a much more uniform disk space usage across nodes.