Triggit Engineering Blog

Fun With /proc

| Comments

This post is authored by our Senior Operations Engineer, Erik Hollensbe.

The /proc filesystem is one of the easiest, cheapest ways to quickly get at the status of your machine. We use it extensively in Gollector, preferring it to other methods of retrieving data.

Something to get your feet wet

netstat -an is a great way to get a list of all the connections running on a machine, but as mentioned in the previous article, it can be quite slow as it pulls everything up. How can we quickly get the count of all network connections on the machine, for example?

/proc/self/net/ has several files named tcp, udp, tcp6, etc that contain this information. They’re actually in each pid-named directory, but contain the information for all connections it can see. /proc/self is simply a magical directory that points at the current process id, similar to how cd /proc/$$ would work. This allows us to interrogate these files without being root!

Here’s some code to get that count:

1
2
3
4
5
6
7
8
9
10
count=0

for i in tcp tcp6 udp udp6 unix udplite udplite6
do
  count=$(($count + $(wc -l /proc/self/net/$i | awk '{ print $1 }')))
  # each file has a header line, so decrement the count by 1 so we're not off.
  count=$(($count - 1))
done

echo $count

Which on my dev machine returns 255. The output from netstat -an (with descriptive headers) reports 261. We’re good.

Read the friendly manual

man 5 proc has tons of information on these files. I strongly suggest you at least page through it before trying to write C to interrogate the system directly. Depending on what you need to do, reading these files could be considerably less error-prone code.

Let’s dig deeper.

netstat -anp yields all sockets and the PIDs and executables attached to them. This is very useful when trying to find a problem bind(2) or a process which is not cleaning up its sockets properly.

Output looks something like this:

1
2
3
4
5
6
7
8
9
10
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1479/master
tcp        0      0 127.0.0.1:953           0.0.0.0:*               LISTEN      1191/named
tcp        0      0 0.0.0.0:40027           0.0.0.0:*               LISTEN      653/Plex Plug-in [c
tcp        0      0 0.0.0.0:32443           0.0.0.0:*               LISTEN      14724/Plex Media Se
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      3479/vmware-hostd
tcp        0      0 0.0.0.0:902             0.0.0.0:*               LISTEN      3364/vmware-authdla
tcp        0      0 127.0.0.1:6379          0.0.0.0:*               LISTEN      13686/redis-server
tcp        0      0 0.0.0.0:1836            0.0.0.0:*               LISTEN      14873/Plex DLNA Ser

Let’s write this, using /proc.

exe and cmdline

/proc/<pid>/cmdline and /proc/<pid>/exe are two very useful ways to get at process information for any given process ID.

cmdline is just a text representation, with each argument delimited by a NULL (ASCII 0) byte, accessible by all users. This is the “process title” you see in ps output.

exe is a little more esoteric but extremely useful. Only accessible by the user (or root), it’s a symlink to the dereferenced path of the binary used to run the process.

For example:

I run /bin/zsh -l as my shell. cat /proc/self/cmdline at this point is /bin/zsh\x00-l (the escape is not literal — that’s an ASCII 0 byte). However, on Ubuntu machines zsh comes from /bin/zsh4, which if you readlink /proc/self/exe, is what you end up getting.

exe is really nice for things like sidekiq or sendmail which change their process title; the contents of which will be reflected in /proc/self/cmdline instead of the actual binary executed.

OK… Get on with it.

Let’s write pgrep first, shall we? You’ll need to run this as root.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
my_pgrep() {
  command=$1

  (
    for i in /proc/*/exe
    do
      if readlink $i | grep -q "$command"
      then 
        echo $(basename $(dirname $i))
      fi
    done
  ) | sort -n
}

pgrep zsh

my_pgrep /bin/zsh4

Quite a bit slower than pgrep, but very effective. Can you write a version that uses /proc/*/cmdline and works exactly like pgrep -f?

Sockets and readlink

readlink is a swiss army chainsaw. It’s great for locating the source of symlinks, device files, and … sockets.

Let’s look at a running copy of named. It has numerous file descriptors in /proc/1191/fd and we want to find out which ones are sockets. How do?

Well, readlink can tell us (again, you’ll need to be the same user as named or root to try this):

1
2
3
4
5
6
7
8
9
10
11
my_lsof() {
  pid=$1

  (
    for i in /proc/1193/fd/*
    do
      echo -n "$(basename $i) "
      readlink $i
    done
  ) | sort -n -k 1
}

Which should yield output similar to this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0 /dev/null
1 /dev/null
2 /dev/null
3 socket:[9783]
4 /dev/null
5 pipe:[1423]
7 pipe:[1423]
8 anon_inode:[eventpoll]
9 /dev/random
20 socket:[14362]
21 socket:[14367]
22 socket:[14369]
23 socket:[9795]
24 socket:[9796]
25 socket:[15925]
26 socket:[15927]
512 socket:[14361]
513 socket:[14366]
514 socket:[14368]
515 socket:[15924]
516 socket:[15926]
517 socket:[262364]

So, we see /dev/null in quite a few spots (this is a daemon, after all), and some special syntax for ephemeral file descriptors:

1
type:[inode]

These inodes can be used to look up values in the network files we examined earlier.

Finally! We’re getting to the meat!

If we look at /proc/self/net/tcp, we see this:

1
2
3
4
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
   0: 0100007F:0019 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 1501 1 ffff8807f03b8000 100 0 0 10 0
   1: 0100007F:03B9 00000000:0000 0A 00000000:00000000 00:00000000 00000000   109        0 9795 1 ffff8807f0510000 100 0 0 10 0
   2: 00000000:9C5B 00000000:0000 0A 00000000:00000000 00:00000000 00000000   110        0 7488168 1 ffff880572db7000 100 0 0 10 0

This contains the information we need — note that inode column on the far right. We can also use /usr/bin/stat or an equivalent to cheat a little on what we look at.

So, let’s dig these sockets up! I’ve written this in ruby instead of shell because it should be a little faster, and a little more terse. I’ve done my best to comment the code on a line-by-line basis so you can understand what’s happening.

Note that no attempt has been made to handle unix sockets or IPv6. An exercise for the reader, perhaps.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!ruby

def unpack_raw(addr)
  # extract the network information from the raw ip:port in hex.
  raw_ip, port = addr.split(/:/).map(&:hex) # convert the hex values to integers
  ip = ""
  4.times do
    ip += (raw_ip & 0xFF).to_s # pull the next octet
    ip += "." # append a dot
    raw_ip = raw_ip >> 8 # shift down the octets
  end
  return ip.chop, port # String#chop is used because we'll have a trailing dot.
end

NETWORK_FILES = %w[tcp udp]

inode_hash = { } # mapping of inode -> fd file

Dir["/proc/*/fd/*"].each do |fd|
  link = (File.readlink(fd) rescue nil)
  if link and link =~ /\Asocket:\[/
    inode_hash[link.match(/\Asocket:\[(.*?)\]\z/)[1]] = fd
  end
end

NETWORK_FILES.each do |file|
  lines = File.readlines("/proc/self/net/#{file}")
  lines.shift
  network_info = lines.map { |line| line.split(/\s+/) }
  network_info.each do |info|
    begin # file descriptors can vanish at any time; just bail if we can't get the info
      local, remote = info[2..3]
      local_str     = unpack_raw(local).join(":")
      remote_str    = unpack_raw(remote).join(":")

      inode       = info[10] # get the inode from the text
      fd_file     = inode_hash[inode] # look up our fd file
      owner       = File.stat(fd_file).uid # get the owner of the fd
      pid         = fd_file.match(%r!/proc/(.*?)/!)[1] # get the pid of the fd
      executable  = File.readlink("/proc/#{pid}/exe") # get the process the fd belongs to

      # this prints our output
      puts "#{file}: local:#{local_str} remote:#{remote_str} uid:#{owner} pid:#{pid} exec:#{executable}"
    rescue
    end
  end
end

This is the kind of output it yields when run as root:

1
2
3
4
5
6
7
tcp: local:127.0.0.1:25 remote:0.0.0.0:0 uid:0 pid:1479 exec:/usr/lib/postfix/master
tcp: local:127.0.0.1:953 remote:0.0.0.0:0 uid:109 pid:1191 exec:/usr/sbin/named
tcp: local:0.0.0.0:40027 remote:0.0.0.0:0 uid:110 pid:653 exec:/usr/lib/plexmediaserver/Resources/Python/bin/python
tcp: local:0.0.0.0:32443 remote:0.0.0.0:0 uid:110 pid:14724 exec:/usr/lib/plexmediaserver/Plex Media Server
tcp: local:0.0.0.0:443 remote:0.0.0.0:0 uid:0 pid:3479 exec:/usr/lib/vmware/bin/vmware-hostd
tcp: local:0.0.0.0:902 remote:0.0.0.0:0 uid:0 pid:3364 exec:/usr/sbin/vmware-authdlauncher
tcp: local:127.0.0.1:6379 remote:0.0.0.0:0 uid:111 pid:13686 exec:/usr/bin/redis-server

Thanks!

Please, again, man 5 proc for more information on this great service!

Comments