CoreOS after some reboot can't load etcd unit with user_data config

Question

I'm trying CoreOS(version 410.0.0 stable) that was installed on disk with this cloud-config. All work fine at first boot, but after some days and some reboots the problem was presented with etcd.

When I start the machine the output generate message like:

Failed to start Load cloud-config from  /var/

<some output lines>

Failed to start Login service

Then when I'm try to login valid user, the console output it's:

CoreOS(stable)
Failed Units: 1
   user-cloudinit@var-lib-coreos\x2install-user_data.service
devops@deis-server2~$

At this point the system it's wake up, but etcd not is registered with discovery url present on cloud-config.

Anybody have idea about this problem? And because occure after some reboot?

score 1 · Answer 1 · answered Sep 19 '14 at 19:18

1

I solved the problem with the help of @crawford. That some steps were applied:

remove hostname and discovery lines from /var/lib/coreos-install/user_data
remove directory /var/lib/etcd
reboot the system

Then, all work fine. Thanks again to @crawford

answered Sep 19 '14 at 19:18

enrique-carbonell

5,836
3
30
44

It appears I have to do step 2 & 3 every time I reboot the system. Is yours surviving a reset now? – Bryan Larsen Dec 17 '14 at 17:19
@BryanLarsen I'm sorry but your question is not clear, please add more details or tellme if you need talk via irc chat – enrique-carbonell Dec 17 '14 at 18:31
I'm just wondering if your servers are rebooting correctly without issue now? I get the same console output that you listed in your question unless I do an `rm -rf /var/lib/etcd` before every reboot. – Bryan Larsen Dec 17 '14 at 18:40
ohh..well [here][https://github.com/coreos/bugs/issues/146] is the issue to be raised on the subject, it's open yet. Currently I'm not using CoreOS, but I remember just had to delete things when server presented that error and not every time you restarted. I you can contact with pepe for more information or follow the issue referenced. – enrique-carbonell Dec 17 '14 at 18:51
remember check your CoreOS version, because I post this question some month ago – enrique-carbonell Dec 17 '14 at 18:52

score 0 · Answer 2 · answered Feb 06 '15 at 12:44

0

Over time, as machines come and go, the discovery URL will eventually contain addresses of peers that are no longer alive. Each entry in the discovery URL has a TTL of 7 days

It's also possible that a discovery URL can contain no existing addresses, because they were all removed after 7 days. This represents a dead cluster and the discovery URL won't work any more and should be discarded.

For more information : https://coreos.com/docs/cluster-management/setup/cluster-discovery/#existing-clusters

answered Feb 06 '15 at 12:44

Samir

111
1
6

What does this mean? It means we cannot restart node after 7 days? – Quanlong May 20 '15 at 11:27
Yes, sometimes your existing addresses can be all removed after 7 days and you cannot restart the child nodes – Samir May 22 '15 at 06:09

CoreOS after some reboot can't load etcd unit with user_data config

2 Answers2