6

I need to be able to repeatably, non-randomly, uniquely identify a server host, which may be arbitrarily virtualized and over which I have no control.

  • A MAC address doesn't work because in some virtualized environments, network interfaces don't have hardware addresses.
  • Generating a state file and saving it to disk doesn't work because the virtual machine may be cloned, thus duplicating the file.
  • The server's SSH host keys may be a candidate. They can be cloned like a state file, but in practice they generally aren't because it's such a security problem that it's a mistake not often made.
  • There's also /var/lib/dbus/machine-id, but that's dependent on dbus. (Thanks Preetam).
  • There's a cpuid but that's apparently deprecated. (Thanks Bruno Aguirre on Twitter).
  • Hostname is worth considering. Many systems like Chef already require unique hostnames. (Thanks Alfie John)

I'd like the solution to persist a long time, and certainly across server reboots and software restarts. Ultimately, I also know that users of my software will deprecate a host and want to replace it with another, but keep continuity of the data associated with it, so there are reasons a UUID might be considered mutable over the long term, but I don't particularly want a host to start considering itself to be unknown and re-register itself for no reason.

Are there any alternative persistent, unique identifiers for a host?

  • Pretty tough logic to solve to get that to work. For example, if you shut down the VM, copy all its files (virtual HD etc) to two new machines and start both up unchanged, which of them would be tagged as the original host? – Joachim Isaksson Aug 19 '13 at 14:41
  • 1
    What is the required duration of the persistence? – John Feminella Aug 19 '13 at 15:20
  • There's also `/var/lib/dbus/machine-id`, but that's dependent on dbus. – Preetam Jinka Aug 19 '13 at 15:43
  • I've clarified (I hope) the requirements a bit more. –  Aug 19 '13 at 17:48
  • Here's a comment from another person on a social network: We chose to use SSH keys in [...], with a fallback to MAC address if not available. You also want something for Windows too perhaps, and for that we chose the Windows Domain SID, which is "somewhat stable" (it would get recompiled if the server moves domains).. We've looked for a better solution than this for years, an there just isn't one. *Especially* in a cross platform manner.. –  Aug 20 '13 at 00:20

4 Answers4

4

It really depends on what is meant by "persistent". For example, two VMs can't each open the same network socket to you, so even if they are bit-level clones of each other it is possible to tell them apart.

So, all that is required is sufficient information to tell the machines apart for whatever the duration of the persistence is.

  • If the duration of the persistence is the length of a network connection, then you don't need any identifiers at all -- the sockets themselves are unique.

  • If the persistence needs to be longer -- say, for the length of a boot -- then you can regenerate UUIDs whenever the system boots. (Note that a VM that is cloned would still have to reboot, unless you're hot-copying it.)

  • If it needs to be longer than that -- say, indefinitely -- then you can generate a UUID identifier on boot and save it to disk, but only use this as part of the identifying information of the machine. If the virtual machine is subsequently cloned, you will know this since you will have two machines reporting the same ID from different sources -- for instance, two different network sockets, different boot times, etc. Since you can tell them apart, you have enough information to differentiate the two cloned machines, which means you can take a subsequent action that forces further differentiation, like instructing each machine to regenerate its state file.

Ultimately, if a machine is perfectly cloned, then by definition you cannot tell which one was the "real one" to begin with, only that there are now two distinguishable machines.

Implying that you can tell the difference between the "real one" and the "cloned one" means that there is some state you can use to record the difference between the two, like the timestamp of when the virtual machine itself was created, in which case you can incorporate that into the state record.

John Feminella
  • 303,634
  • 46
  • 339
  • 357
  • You can tell if it's been cloned, but if everything about the virtualized environment can change--IP, MAC, # of CPUs, whatever--how do you distinguish which instance is the "real" one? – Chris D Aug 19 '13 at 15:29
  • 2
    The requirement was that you be able to tell them apart ("uniquely identify"). I didn't see anything about its track-record being preserved. When you _indistinguishably clone a machine_, neither one is the "real one", by definition! – John Feminella Aug 19 '13 at 15:30
  • I agree that distinguishing any two running instances is more or less trivial. I interpreted "repeatedly, non-randomly, uniquely identify" to mean something much more strict (and I think mathematically impossible under the conditions OP specified). – Chris D Aug 19 '13 at 15:41
  • I've thought about the solution the way you phrased it. To clarify, the software I'm running is connecting and sending data to an API, which I do control. So, if I get reports from two different ip:port sockets, which both send the same UUID to me, I can detect that a UUID has been reused, but you're correct, I don't think I can detect which one's the real one. –  Aug 19 '13 at 17:42
1

It looks like simple solutions have been ruled out. So that could lead to complex solutions, like this protocol: - Client sends tuple [ MAC addr, SSH public host key, sequence number ] - If server receives this tuple as expected, server and client both increment sequence number. - Otherwise server must determine what happened (was client cloned? did client move?), perhaps reaching a tentative conclusion and alerting a human to verify it.

JD Duncan
  • 11
  • 1
0

I don't think there is a straight forward "use X solution" based on the info available but here are some general suggestions that might get you to a better spot.

  • If cloning from a "gold image" consider using some "first boot" logic to generate a unique ID. Config management systems like Chef, Puppet or Cf-engine provide some scaffolding to achieve this.
  • Consider a global state manager like zookeeper. Specifically its atomic counter functionality. Same system could get new ID over time, but it would be unique.
  • Also this stack overflow might give you some other direction. It references Twitter's approach to a similar problem.
Community
  • 1
  • 1
  • Unfortunately I am not in a position to do things like this. I'm building software that is installed by other people on servers I don't control. –  Aug 19 '13 at 17:39
-1

If I understand correctly, you want a durable, globally unique identifier under these conditions:

  • An OS installation that can be cloned while running, so any state inside the VM won't work, and
  • Could be running in an arbitrary virtualization environment, so any state outside the VM won't work.

I realize this doesn't directly answer your question, but it really seems like either the design or the constraints need some substantial adjustment to accomodate a solution.

Chris D
  • 97
  • 4
  • I'm not sure there is a solution, either. I was hoping there's something creative (or obvious) that I overlooked. –  Aug 19 '13 at 17:40