I interviewed today with LinkedIn and [REDACTED]
.
He was intimidatingly knowledgeable and described a work environment that I really hope to join.
I hope that I have the skill set necessary.
What follows are notes of the responses for my edification, not necessarily how I answered.
Describe what happens when we type
ssh shell.linkedin.com
and hit enter.
- When you hit enter, your shell gets the string you entered on the command line. In this case, we send the string to bash.
- Bash will parse the string for whitespace and take
arg[0]
as the name of the executable you intend to execute. - It will lookup said executable in several locations:
- builtins
- functions
- aliases
$PATH
- As an aside: Why must
cd
be a “builtin”?- This is because if you were to spawn
cd
from the current shell like you would another program,cd
would execute, make the syscall to change its own directory, and exit. This would leave the parent process’s environment unchanged.
- This is because if you were to spawn
- Once it finds what you mean by that
arg[0]
, it willfork
and call the executable you intend to run. - Now
ssh
gets that list of arguments can use something likegetopts
to parse meaning out of that. - Assuming that the argument parsing went well, we can now try to resolve the hostname that
ssh
has been passed. - This involves looking in
/etc/hosts
for matching hostname:ip pairs, or failing that, will readresolve.conf
to find a DNS server to execute a recursive DNS lookup. - This is likely encapsulated in a syscall to the kernel. DNS resolution happens over UDP.
- Once you have the IP, you can then establish a TCP connection to the server on port 22 (magic number).
- After the TCP connection is established, you can look here for the SSH Review
You have a system where 10000 clients need to access 10GB of information from a server. The data changes a few sectors a day.
rsync
is the key to this question. Withrsync
, you can establish a connection to a remote server and synchronize files over the connection, sending only what needs to be sent.- Karrick shared with me a utility he discovered in the interview process,
ssync
, which solves the distributed portion of this challenge too. - Without
ssync
, you will run in to the issue that when a change happens on the server, all the clients are notified. This will cause a bottleneck at the server. - To mitigate this, you can setup a self-similar hierarchy whereby the server has 10 nodes it broadcasts to and those nodes synchronize files with it. Each of those nodes in-turn has 10 nodes. Once the files are synchronized to any level, the next level will get a notification.
You just inherited a software system that has no metrics. Describe what you do in the first quarter to allow you to sleep at night.
- Firstly, I would get a trial of SignalFX, because it is the only platform of this kind that I know.
- SignalFX has a fork of
collectd
that it uses to gather information from nodes. - Giving SignalFX your AWS credentials will allow it to identify all the hosts that you run. EC2, RDS, other things.
- Once that data is in SignalFX, you can begin to create dashboards and setup alerts.
- Those alerts can be piped to PagerDuty or Karrick’s recommendation, Iris. He wrote Iris, and it seems like an awesome open-source solution to the problem that PagerDuty solves.