A scary Proxmox server issue
This is a story all about how I was doing a simple configuration on my Proxmox server and I wanted to edit the hostname so that I could have it match with my SSL certificate. This was signed by my own Intermediate Certificate Authority on my Pfsense Server. However, I made a mistake and brought my network down and thus began a frenzy to get things back up and running. Thankfully nobody was inconvenienced too bad, but there is an important lesson to learn from this!
No Assumptions! Zero Trust.
I have learned not to assume. Something can be characterized as simple, but also vastly complex. I thought I would simply change the hostname and my SSL certificate would work, however, I had too much haste in my endeavor to switch things up because I should have read a little bit more on the process of changing a hostname in Proxmox. Instead, I assumed it would be one step. There was a little bit more that made it tricky. In the future, I will look up explicitly, “How do I change the Proxmox hostname?” so I don’t make the same mistake. Insert whatever problem.
It reminds me of a guess and check approach versus redundant checks that absolutely guarantee no mistakes, and I’m usually on top of it, but it shows that sometimes it can be tempting to move fast. Don’t move too fast.
A Comedy Of Errors
In Proxmox’s documentation, they outline that the operating system is set up so that the LXC containers and Virtual Machines stored in the Proxmox database are mounted based on specific hostname files correlating with eachother. If you change one, but not the other, they will become mismatched, and thus the mount won’t occur properly.
Then, when you restart your device, you will be greeted with a nightmare of the /etc/pve/ directory where your servers are stored, completely empty.
As a result, you will question what happened to the data, and you will try to issue the lsblk command, and see that data wasn’t deleted, but it is definitely missing. It was lots of reading logs and trying to understand why the Proxmox cluster service wasn’t running.
You will then use your 4G data to look into the issue because you have no internet, and your phone is at 10% battery, but because of your pro Googling skills, you will access some documentation from the awesome supporters of Proxmox, informing you that the /etc/hostname, /etc/hosts must correlate with eachother.
Once you figure this out, you will start to make the appropriate changes and cross your fingers that no data was corrupted.
Make Backups
This was a reminder for me to start backing up my systems and ensuring that there are multiple paths to victory. Things like this can’t happen on mission-critical systems, and we want to make sure everyone can do their thing on the network when the clock strikes the work hours.
Lessons Learned
I wanted to make this short post to start a habit of posting to this blog more frequently. I’m trying post at least four times a week. Let me know if you like this kind of thing, as I pursue these adventures. I’ve learned so much from this, particularly gave me a lot of thanks to such smart people in the world that helped resolve my issue. Thank you for reading!
Never be afraid to look things up, try things, fail, rinse, repeat. I’ve learned that failure is part of the process to learning.
Until next time!
Wishing the best,
David
Resources
https://pve.proxmox.com/wiki/Renaming_a_PVE_node
https://forum.proxmox.com/threads/etc-pve-is-empty.12953/
https://forum.proxmox.com/threads/change-proxmox-hostname.105941/
https://forum.proxmox.com/threads/how-to-mount-etc-pve-in-rescue-mode.12496/