Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH Connection Failed #3701

Closed
Jayashree-D opened this issue Sep 24, 2020 · 8 comments
Closed

SSH Connection Failed #3701

Jayashree-D opened this issue Sep 24, 2020 · 8 comments

Comments

@Jayashree-D
Copy link
Contributor

In the latest openbmc build, after image upgradation in the target, not able to connect the target through SSH but able to ping the IP Address.
While connecting through SSH, it is not responding or throwing any error and tried with "ssh -vvv" command and got the below response.

OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug1: Connecting to 10.0.128.108 [10.0.128.108] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4

Observation on UART-console after flashing latest image:

  1. reboot command is not working.
  2. systemctl status <service_name> is not providing any status. ( Failed to get properties: Connection timed out)

After analysing the latest commits, reverted the below commit in the latest build and checked by flashing the image. Now the target is connecting through SSH.

Commit Link - - 635e0e4

@jeb-de
Copy link

jeb-de commented Sep 30, 2020

Nearly the same for me.
My build target is the ast-2500-evb.

Tested with 73ed7d0 from 2020-09-25

After startup, for some time, the commands reboot and systemctl work, but after some time or after some commands
they hang (but they can still be cancled by CTRL-C)

There is no dropbear (ssh server) process, but the port 22 is reserved by something (netstat -lt)
Manually starting dropbear by dropbear -E -p 5022 works (for some time).

I can confirm that reverting 635e0e4 solves the problem.

@leiyu-bytedance
Copy link
Contributor

I guess this is the same as #3697

@Jayashree-D
Copy link
Contributor Author

I have run "dropbear -E -p 5022" in the target (UART-console) and tried to connect the target using "ssh -p 5022 " and SSH connection established.
But, reboot and systemctl commands hangs.

root@tiogapass:~# dropbear -E -p 5022
[348] Jan 01 00:06:48 Failed loading /etc/dropbear/dropbear_dss_host_key
[348] Jan 01 00:06:48 Failed loading /etc/dropbear/dropbear_ecdsa_host_key
[348] Jan 01 00:06:48 Failed loading /etc/dropbear/dropbear_ed25519_host_key
[349] Jan 01 00:06:48 Running in background

Whether any solution has been identified to resolve this issue?

@jeb-de
Copy link

jeb-de commented Oct 5, 2020

I checked why systemctl status hangs.

I used strace, and I got the line where it hangs

root@evb-ast2500:~# strace systemctl status
...
uname({sysname="Linux", nodename="evb-ast2500", ...}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=206, tv_nsec=783189733}) = 0
recvmsg(3, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=206, tv_nsec=788808723}) = 0
ppoll_time64([{fd=3, events=POLLIN}], 1, {tv_sec=89, tv_nsec=9148931269518363576}, NULL, 8)

The ppoll_time64 doesn't return, but on a working system (with the revert of 635e0e4) the same line returns as expected.

ppoll_time64([{fd=3, events=POLLIN}], 1, {tv_sec=89, tv_nsec=9148931269518363576}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=89, tv_nsec=939882475})
It looks like a problem with the socket fd=3 connected to "/run/systemd/private"

@mdmillerii
Copy link
Contributor

The socket /run/systemd/private is communication between systemd and systemctl. The ssh socket (tcp/22) is initially opened by systemd and then sshd is started when the first client connects. Both hangs sounds like somehow systemd is hanging. If you don't find anything in a web search I'd suggest getting the systemd commits that came in that poky update and trying to isolate or bisect them. Alternately you could try to replicate under qemu as I suspect debugging pid 1 is tricky.

@geissonator
Copy link
Contributor

We def ran into some issues downstream with that systemd update. I pushed a change upstream, which is still incoming to the openbmc code base. Pulling it in downstream solved our issues which were somewhat similar to this.

May want to try https://gerrit.openbmc-project.xyz/c/openbmc/openbmc/+/36716.

@Jayashree-D
Copy link
Contributor Author

I pulled the latest commit c3d88e4
and tested the image in my target and able to access the target through SSH.
Also systemctl and reboot command works.

@leiyu-bytedance
Copy link
Contributor

Yup, the same as #3697, this issue could be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants