Proxmox: After Migrating VMs to node1, All VM Consoles Broke (“Failed to run vncproxy”) — Full Diagnosis and Fix Log

Posted on 2025-12-172025-12-17 by hollen9

This post documents the full troubleshooting process (what I checked and why), not just the final workaround.

Environment

There is a dedicated 10G internal network between the cluster nodes and the NAS:

node1: 172.16.0.20
node2: 172.16.0.21
NAS2 (Synology): 172.16.0.10

The NAS also has another reachable service IP:

NAS2 service IP: 192.168.10.132

0. Symptoms

After migrating VMs from node2 to node1:

No VM console could be opened in the Proxmox Web UI.
The UI error was: failed to run vnc proxy

Additional observations:

The node-level shell console (node1 → Shell) still worked.
Running qm vncproxy <vmid> manually on node1 returned:
- LC_PVE_TICKET not set, VNC proxy without password is forbidden
This is expected when calling qm vncproxy outside the Web UI/API context, because it needs a PVE ticket. It was not the root cause.

1. First Checks: Is Proxmox Itself Healthy?

1.1 PVE services

systemctl status pveproxy pvedaemon pve-cluster --no-pager

What I saw:

pveproxy, pvedaemon, and pve-cluster (pmxcfs) were all active.
pveproxy logs frequently showed: proxy detected vanished client connection (often appears when a console connection drops unexpectedly).

1.2 Cluster / quorum

pvecm status

What I saw:

node1 still had quorum (Quorate: Yes).
Even if the cluster looked reduced (node1 + qdevice), pve-cluster itself was functioning.

Conclusion so far: this was not a “cluster is down” situation.

2. Console Infrastructure Check: termproxy / vncproxy and Ports

2.1 Check if termproxy/vncproxy are running and listening

ss -ltnp | egrep ':(59[0-9]{2})\b'
ps aux | egrep 'termproxy|vncproxy' | grep -v grep

What I saw:

termproxy appeared (for example, termproxy 5900 ...).
But VM consoles still failed.

Conclusion so far: the proxy layer existed, but something deeper prevented it from completing.

3. Find the Real Error: Why Does vncproxy Fail?

3.1 Read recent logs from pvedaemon and pveproxy

journalctl -u pvedaemon -u pveproxy --since "10 min ago" -l --no-pager | tail -200

Key pattern in the logs:

Many VMs produced repeated errors like:
- VM <id> qmp command failed ... unable to connect to VM <id> qmp socket - timeout after 51 retries
When attempting a console, the sequence looked like:
- starting vnc proxy ...
- qmp command 'set_password' failed ... unable to connect to VM <id> qmp socket ...
- Failed to run vncproxy.

Conclusion: the console wasn’t failing because “VNC proxy can’t start”; it failed because Proxmox could not talk to the VM’s QEMU via QMP in order to set VNC credentials.

4. The Contradiction: VM “running”, QMP sockets exist, but QMP is unreachable

Using VM 502 as an example.

4.1 Verify VM status and QMP/VNC socket files

qm status 502
ls -l /run/qemu-server/502.qmp /run/qemu-server/502.vnc 2>/dev/null || true

What I saw:

qm status 502 reported running.
/run/qemu-server/502.qmp and /run/qemu-server/502.vnc existed.

4.2 Try QEMU monitor

Because my qm version did not support --cmd/--command, I entered the interactive monitor:

qm monitor 502
# qm> info status

What I saw:

Even the monitor command failed:
- human-monitor-command failed due to QMP timeout.

4.3 Check whether the qemu process actually exists

pgrep -af "qemu.*-id 502" || ps -ef | grep -E "qemu.*-id 502" | grep -v grep

What I saw:

No qemu process for VM 502 was found.
Yet Proxmox believed the VM was running and the socket files existed.

Conclusion: this strongly suggested a stuck or inconsistent runtime state, often caused by underlying storage/IO problems or stale mounts affecting how Proxmox tracks VM runtime state.

5. Secondary Clue: a shared NFS storage looked wrong on node1

I noticed something else that correlated with the timing:

A shared NFS storage (nas2-pxshare) showed correct capacity on node2.
On node1, it appeared as a grey question mark in the UI.

5.1 Confirm storage status from CLI

timeout 5 pvesm status || echo "pvesm status timeout"

What I saw:

nas2-pxshare was inactive on node1 (0 total/used/available).

5.2 Confirm the storage definition

grep -n "nas2-pxshare" -A6 -B2 /etc/pve/storage.cfg

At that time the definition was intended to be:

server: 172.16.0.10
export: /volume1/PxShare
path: /mnt/pve/nas2-pxshare

5.3 Verify network connectivity to NAS2 on the 10G network

ping -c 2 172.16.0.10
ip route get 172.16.0.10
rpcinfo -p 172.16.0.10 | head -30
showmount -e 172.16.0.10

What I saw:

Ping was fast and stable.
The route clearly used the internal bridge (vmbr1) and source 172.16.0.20.
rpcinfo and showmount succeeded and exports were listed.
The export ACL included 172.16.0.20 and 172.16.0.21.

Conclusion: the NAS and network were fine. The problem was about how node1 had mounted the share (and how Proxmox evaluated it).

6. Root Cause at the Mount Layer: “server IP drift” and stacked mounts on the same path

6.1 Check the effective mount source

findmnt /mnt/pve/nas2-pxshare || echo "NOT MOUNTED"
cat /proc/self/mountinfo | grep "/mnt/pve/nas2-pxshare"

What I saw earlier during troubleshooting:

findmnt reported the source as:
- 192.168.10.132:/volume1/PxShare

Even though the Proxmox storage definition was supposed to be 172.16.0.10.

At the same time, mount options showed internal-network details (for example, clientaddr=172.16.0.20 and references to addr=172.16.0.10), so actual traffic was not necessarily going through the slow network. But from Proxmox’s point of view, the mounted “server identity” did not match the storage.cfg definition, which can lead to inactive.

6.2 The smoking gun: stacked mounts on the same mountpoint

Later, after stopping some PVE services, I observed the same mountpoint appearing twice:

findmnt /mnt/pve/nas2-pxshare -o TARGET,SOURCE,FSTYPE,OPTIONS

It showed two layers:

/mnt/pve/nas2-pxshare 192.168.10.132:/volume1/PxShare nfs4 ...
/mnt/pve/nas2-pxshare 172.16.0.10:/volume1/PxShare nfs ...

This is a stacked mount situation:

The lower layer was NFSv4 showing 192.168.10.132.
The upper layer was NFSv3 showing 172.16.0.10.

fuser showed it was only held by the kernel mount, not a user process:

fuser -vm /mnt/pve/nas2-pxshare

7. Fix (without changing paths): fully unmount stacked layers, then mount only NFSv3

Goal:

Keep using /mnt/pve/nas2-pxshare (no new directories).
Remove stacked mounts completely.
Remount cleanly using 172.16.0.10 (10G network) with a single NFS version.

7.1 Stop PVE services that might trigger storage checks/re-mount behavior

systemctl stop pvestatd pvedaemon pveproxy

7.2 Unmount repeatedly until the mount is truly gone

This step was critical because there were multiple layers on the same mountpoint.

umount -f /mnt/pve/nas2-pxshare 2>/dev/null || umount -l /mnt/pve/nas2-pxshare
findmnt /mnt/pve/nas2-pxshare -o TARGET,SOURCE,FSTYPE,OPTIONS || echo "NOT MOUNTED"

umount -f /mnt/pve/nas2-pxshare 2>/dev/null || umount -l /mnt/pve/nas2-pxshare
findmnt /mnt/pve/nas2-pxshare -o TARGET,SOURCE,FSTYPE,OPTIONS || echo "NOT MOUNTED"

I continued until it returned:

NOT MOUNTED

7.3 Remount using NFSv3 on the internal 10G IP

mount -v -t nfs -o vers=3,proto=tcp 172.16.0.10:/volume1/PxShare /mnt/pve/nas2-pxshare
findmnt /mnt/pve/nas2-pxshare -o TARGET,SOURCE,FSTYPE,OPTIONS

Verification: it must show only one entry and the correct source:

/mnt/pve/nas2-pxshare 172.16.0.10:/volume1/PxShare nfs ...

7.4 Start PVE services back

systemctl start pveproxy pvedaemon pvestatd

7.5 Confirm Proxmox now sees the storage as active

pvesm status | grep nas2-pxshare

Result:

nas2-pxshare         nfs     active     18739479296     16557377408      2182101888   88.36%

At this point, the VM console issue was resolved.

8. Postmortem: Why This Happened (important background)

After reviewing the history, the most plausible explanation is:

Originally, nas2-pxshare was created pointing to 192.168.10.132.
Later, I deleted and re-created a storage with the same name nas2-pxshare but pointing to 172.16.0.10.
node2 did not show abnormal behavior, but node1 kept a stale mount state and/or ended up stacking mounts (NFSv4 from the old server identity plus NFSv3 from the new one) on the same mountpoint.
Once node1’s nas2-pxshare became inconsistent (wrong “server identity” or stacked mounts), Proxmox marked it inactive and started timing out in operations that indirectly depend on storage stability. The QMP timeouts and vncproxy failures were symptoms of the node being in a broken state, not purely a “console feature” issue.

9. Prevention Notes

Avoid changing a storage definition to a different server IP while keeping the same storage name, unless you ensure the old mount is fully gone on every node.
If a Proxmox NFS storage suddenly becomes inactive, the first command to run is:

findmnt /mnt/pve/<storage> -o TARGET,SOURCE,FSTYPE,OPTIONS

If the same mountpoint appears multiple times, resolve the stacked mounts first before trusting any higher-level Proxmox behavior.

How to setup Onboard Virtual Keyboard for Debian 13 + KDE Plasma 6 + Wayland (uinput-based)

Posted on 2025-12-152025-12-15 by hollen9

Preface: From Almost Giving Up to Finally Being Able to Type

Honestly, this article was written when I was very close to giving up on using an on-screen keyboard under KDE Plasma Wayland.

On Debian 13 with KDE Plasma 6 running on Wayland, the only officially supported virtual keyboard is Maliit.
In practice, however, it comes with a series of nearly deal-breaking problems:

The keyboard only appears once
After swiping it down, it can never be summoned again
Requires restarting KWin or the entire desktop session
Completely unsuitable for real tablet or 2-in-1 usage

After digging through GitHub issues and KDE Discuss threads, I honestly started to think:

“Maybe choosing KDE Wayland on a touch device is simply a dead end.”

That changed when I found a KDE Discuss thread from 2022:

Plasma 6 and Wayland no on-screen keyboard working - Help - KDE Discuss

In that thread, a user named @INVICTRA mentioned:

I managed to get Onboard working on wayland. Kubuntu 25.04

Edit the shortcut and add GDK_BACKEND=x11
Set input source to GTK
Set keystroke generator to uinput

The post was short and incomplete, but it revealed something important:

Onboard + X11 backend + uinput might be the real breakthrough.

With that clue, I started experimenting, filling in the missing pieces: kernel modules, permissions, udev rules, and Wayland constraints.
In the end, I successfully achieved a stable, repeatable, non-freezing virtual keyboard on:

Debian 13 + KDE Plasma 6 + Wayland

This article is the fully documented result of that process.

1. Why This Is Necessary (Background)

As of late 2025, KDE Plasma Wayland officially supports only Maliit
Maliit currently suffers from severe bugs on Plasma (cannot be re-opened after hiding)
Wayland intentionally forbids synthetic input (fake keyboard/mouse events)
Onboard can create a real input device via the Linux kernel uinput subsystem
uinput devices are kernel-level input devices and are not blocked by Wayland

In short:

Wayland does not allow you to fake keystrokes,
but uinput lets you attach a real virtual keyboard.

Under Debian + KDE Plasma Wayland today,
this is practically the only solution that actually works.

2. System Requirements

Debian GNU/Linux 13
KDE Plasma 6
Wayland session
XWayland installed (usually installed by default)
User has sudo privileges

3. Install Required Packages

sudo apt update
sudo apt install onboard xwayland

4. Enable the Kernel uinput Module

1. Check if uinput is loaded

lsmod | grep uinput

If there is no output, load it manually:

sudo modprobe uinput

2. Enable uinput at boot

echo uinput | sudo tee /etc/modules-load.d/uinput.conf

5. Configure uinput Permissions (Critical Step)

1. Create the group

sudo groupadd -f uinput

2. Add your user to the group (example: hln)

sudo usermod -aG uinput hln

Important: You must log out or reboot after this step.

3. Create a udev rule

sudo nano /etc/udev/rules.d/99-uinput.rules

Contents:

KERNEL=="uinput", MODE="0660", GROUP="uinput"

Reload rules:

sudo udevadm control --reload
sudo udevadm trigger

4. Verify After Reboot

ls -l /dev/uinput

Expected output:

crw-rw---- 1 root uinput /dev/uinput

Confirm group membership:

groups

You should see uinput.

6. Launch Onboard Using the X11 Backend (Very Important)

Under Wayland, Onboard must be forced to use the X11 backend:

GDK_BACKEND=x11 onboard

It is recommended to test this first in a terminal.

7. Required Onboard Settings

Open Onboard → Preferences → Keyboard → Advanced

Set the following:

Input Options
- Input event source: GTK
Keystroke Generation
- Key-stroke generator: uinput

If you previously tried uinput and it did not work,
you must re-test after completing the permission setup.

8. Verification

Open a text-input application (Kate / Firefox / Konsole)
Focus a text field
Click keys on Onboard

Successful behavior:

Text appears in the application
Keyboard can be shown and hidden repeatedly
No need to restart KWin
No freezing or dead state
Completely avoids Maliit bugs

9. Create a Desktop Launcher (Recommended)

nano ~/.local/share/applications/onboard-x11.desktop

Contents:

[Desktop Entry]
Name=Onboard (Wayland Safe)
Exec=env GDK_BACKEND=x11 onboard
Type=Application
Icon=onboard
Categories=Utility;Accessibility;

You can now:

Pin it to the KDE panel
Place it on the desktop
Use it as a one-click virtual keyboard launcher

10. Limitations and Notes

Known Limitations

Does not work on the SDDM login screen
Not Wayland-native (runs via XWayland)
Elevated input permissions (recommended for personal devices only)

Advantages

No swipe-down freeze issue
Full Ctrl / Alt / Function key support
Works with Fcitx5 (Chewing / Zhuyin)
Compatible with Synergy / KVM
Stable for long-term use

11. Reverting the Setup (Optional)

sudo rm /etc/udev/rules.d/99-uinput.rules
sudo gpasswd -d hln uinput
sudo reboot

12. Conclusion

On Debian 13 with KDE Plasma Wayland:

Onboard + uinput is currently the only virtual keyboard solution that truly works.

It is not an official or perfect solution,
but KDE is a volunteer-driven community, and expectations should remain realistic.

What matters most is that the problem is solved:
I can now use a keyboard-less tablet to type Taiwanese Mandarin with Zhuyin or English and actually get work done.