Linux Killswitch

[ unix-tools  wwe  ]

Recently, I’ve been adminstering a Linux server for some folks on the Content Analytics and Digital Analytics teams. With this great power comes–you guessed it–great responsibility. And one such responsibility is ensuring that the system’s resources are available and in good use. A few of our team members are developing Selenium scripts, which employ a Chrome webdriver and Xvfb virtual display, both of which seem to have a problem of hanging around long after they’ve stopped being used. This seems to happen when a function or script someone wrote crashes, or if they don’t properly close things (e.g., display.stop(); driver.quit()). I mean, that’s somewhat of a hypothesis on my part – the main point is that these processes build up into the 100’s and just hang around.

ps -ef | grep chrome | wc -l  # Whoa, way too many!
ps -ef | grep Xvfb | wc -l    # Not as much, but still too many!

Killing Things, Slow and Stupid

If you just type ps into the command line, you won’t really see any processes besides those that are directly related to your current shell and username. At one point, I learned to use the ‘e’ and ‘f’ flags to get a much more complete list:

ps -ef

If your system has hundreds of processes running, you might be like, “Whoa, that’s a lot, but is it ‘a lot’ a lot or just a lot?” To know this, you have to (i) count, and (ii) build an intuition for what’s normal. For counting, just do this:

ps -ef | wc -l

To build an intuition, just do that a lot – throughout the day and throughout the week. On my system, it seems like 250 - 350 is normal. When it gets to 700, I know something’s up, and from this happening several times now, I know that it’s probably related to the Chrome webdriver (though Xvfb is also showing up quite a bit these days since we figured out the benefit of using virtual displays in our Selenium scripts).

To see all the processes related to Chrome, simply grep it:

ps -ef | grep chrome

At this point, you can look for some old, long-running Chrome procesess and kill them:

kill PID

Killing Things, Faster and Smarter

The problem is that there are so many Chrome-related processes, and many of them depend on each other. If you kill the right one, you might kill like 20 at once. Alternatively, you could end up having to kill all 20 individual processes individually… It would be useful to know which processes to kill first!

One approach is to kill the processes that take up the largest %CPU or %MEM, but using “-ef” does not provide that info. Fortunately, I learned about using the ‘aux’ argument today:

ps aux  # gives a lot of info w/ easy-to-interpret cols  like %CPU, %MEM

I also learned about ps’s tree view today:

ps axjf

Both approaches allow you to make smarter decisions about which processes to kill first so that you kill as many processes as possible with the least amount of work. That said, JEEZ! Wouldn’t it be nice to just kill all processes related to some command or program all at once?

The answer, of course, is YES!

Killing Things with Wild Abandon

This is where pgrep comes in: it returns all process IDs (PIDs) associated with a command. Try it!

pgrep cat
pgrep chrome
pgrep Xvfb

If you’re sure you can safely kill all processes at once, just do this:

kill `pgrep chrome`  # Kill 'em all in one stroke

This is a powerful command! Use it wisely. For example, we actually have hourly Selenium bots that go out and do some useful work – wouldn’t want to kill ‘em! But knowing their schedule allowed me to use this effectively, and the efficiency was so incredible I just had to share the good news :-)


Related: I also played around with htop today, which is like top, but more human friendly:

top  # look what's going on
htop  # like top, but for humans 

Some References

  • https://www.digitalocean.com/community/tutorials/how-to-use-ps-kill-and-nice-to-manage-processes-in-linux
Written on April 10, 2018