Wednesday, May 16, 2012

Swap Space, thrashing

Swap space is part of secondary memory(hard disk) and is used as an extension of RAM so that the effective size of usable memory grows correspondingly. Virtual memory is thus a combination of RAM + Swap space.

In Linux, a swap space can be a partition or a file.

So what is special about swap space compared to other parts of hard disk?

Swap space is part of hard disk. No filesystem is written on this part of hard disk. The things that differentiate swap space from the file systems which make up the other part of hard disk are
  1. Space allocation scheme and 
  2. Data structures that catalog free space 
Space Allocation Scheme -> Usually kernel allocates space for files one block at a time - This is done to reduce the amount of fragmentation and hence, unallocatable space in the file system. However, in a swap space, the kernel allocates space in groups of contiguous blocks - Since speed is critical and the system can do I/O faster in one multiblock operation than in several single block operations, the kernel allocates contiguous space on swap device without regard for fragmentation.

Free space data structure --> For file systems, the kernel maintains the free space in a linked list of free blocks, accessible from the filesystem super block. For a swap space, the kernel maintains the free space in an in-core table called map. A map is an array where each entry consists of an address of an allocatable resource and the number of resource units available there.

Are multiple swap spaces allowed?

Multiple swap devices are allowed. In case of multiple swap devices being available, kernel chooses the swap device in a round robin scheme, provided it contains contiguous memory. Administrators can create and remove swap devices dynamically.

In Linux 2.6 kernel, a max of 32 swap areas are allowed to exist in a system (check, man mkswap)

If there are multiple disks, setup swap partitions on each disk and set them to the same priority with  pri option. The kswapd daemon will round robin across the partitions improving the performance.

Do zombie process get swapped out?

Zombie processes are not swapped out, because they do  not use any physical memory.

Recommended size for swap space

RedHat recommendation for size of swap space is

Amount of RAM in the System     Recommended Amount of Swap Space
  • 4GB of RAM or less              a minimum of 2GB of swap space
  • 4GB to 16GB of RAM              a minimum of 4GB of swap space
  • 16GB to 64GB of RAM             a minimum of 8GB of swap space
  • 64GB to 256GB of RAM            a minimum of 16GB of swap space
  • 256GB to 512GB of RAM           a minimum of 32GB of swap space
Thumb rule for Swap space

The 2.2 kernel rule of 2x swap is dead. The recommended thumb rule is as follows
  • Batch Servers       :  4X RAM
  • Database Server    :  <= 1GiB RAM
  • Application Server :  0.5X RAM
  • RAM 1-2 GiB       :  1.5X RAM
  • RAM 2-8 GiB       :  Same size as RAM
  • RAM  > 8 GiB      :  0.75X RAM
What components decide on the size of swap space?
  1. Full core dumps. If there isn't enough swap to handle a full core dump, you might not be able to diagnose certain system panics. Some operating systems use your swap space to dump their core when the system panics. You (or the OS's developers) can use that core dump to diagnose why. 
  2. Core-dump metadata. There is also sometimes a small amount of metadata that goes along with the core dump. Adding an extra 1M to the swap size will cover this.
  3. Preparing in advance for RAM upgrades. Systems without maximum RAM installed may be upgraded in the future. Set up swap to be ready for this.
What is swapping?

The unmapping of page frames from an active process is called swapping.

- Swap-out : page frames are unmapped and placed in page slots on a swap device.
- Swap-in   : page frames are read in from page slots on a swap device ad mapped into process address space.

Is using swap space bad? Thrashing

Using swap space is not actually bad. 

When pages are written to disk, the event is called a page-out, and when pages are returned to physical memory, the event is called a page-in. A page fault occurs when the kernel needs a page, finds it doesn't exist in physical memory because it has been paged-out, and re-reads it in from disk. 

Page-ins are common, normal and are not a cause for concern. For example, when an application first starts up, its executable image and data are paged-in. This is normal behavior.

Page-outs, however, can be a sign of trouble. When the kernel detects that memory is running low, it attempts to free up memory by paging out. Though this may happen briefly from time to time, if page-outs are plentiful and constant, the kernel can reach a point where it's actually spending more time managing paging activity than running the applications, and system performance suffers. This woeful state is referred to as thrashing. Thrashing occurs when the system is spending more time moving pages into and out of a process working set rather than doing useful work. In thrashing, process(es) frequently keep on referencing pages not in memory, thus spending more time waiting for I/O then getting work done.

Not using swap space, but it's intense paging activity is the problem.

How to find intense paging activity - Thrashing?

vmstat command helps with reporting virtual memory statistics. With this tool we can observe page-ins and page-outs as they happen.
emThe most important columns in vmstat command to determine the paging activity are freesi and so. The free column shows the amount of free memory, si shows  amount of memory swapped in from disk (/s) (page-in) and so shows amount of memory swapped to disk (/s) (page-outs). If so column value remains zero, then there is not much paging activity. However, if we observe nonzero values in so column and if free column too keeps fluctuating,  it indicates there is not enough physical memory and the kernel is paging out. By using top and ps the processes that are using the most memory can be identified.

Displaying Swap space details in Linux

There are many ways to display the swap space details in Linux

# swapon -sh
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       5119992 0       -1

# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       5119992 0       -1

# free -k | grep Swap | awk '{ print $1,$2 }'
Swap: 5119992

# dmesg | grep -i "swap"
Adding 5119992k swap on /dev/sda3.  Priority:-1 extents:1 across:5119992k

How much memory and swap space does each process use? smaps


To find out how much memory and swap space each process is consuming is not possible with standard tools like top or ps. By using the smaps subsystem, introduced in Kernel 2.6.14, it is possible to get the exact amount of memory and swap space used by a process. It can be found at /proc/<pid>/smaps

In the blog http://northernmost.org/blog/find-out-what-is-using-your-swap/, there is a bash script which prints out all running processes and their swap usage


#!/bin/bash
# Get current swap usage for all running processes
# Erik Ljungstrom 27/05/2011
SUM=0
OVERALL=0

for DIR in `find /proc/ -maxdepth 1 -type d | egrep "^/proc/[0-9]"` 
do
         PID=`echo $DIR | cut -d / -f 3`
         PROGNAME=`ps -p $PID -o comm --no-headers`

         for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'`
         do
                let SUM=$SUM+$SWAP
         done

         echo "PID=$PID - Swap used: $SUM - ($PROGNAME )"
         let OVERALL=$OVERALL+$SUM
         SUM=0
done
echo "Overall swap used: $OVERALL"

Run this script as root user.


To find the process with most swap used, just run the script like so:
$ ./getswap.sh | sort -n -k 5

To avoid processes which are not using swap at all
$ ./getswap.sh | egrep -v "Swap used: 0" |sort -n -k 5


What pages get swappped?


Basically there are 4 types of memory pages

  • Kernel pages - Pages holding the program contents of the kernel itself. Fixed in memory and are never moved
  • Program pages - Pages storing the contents of programs and libraries. These are read-only, so no updates to disk are needed.
  • File-backed pages - Pages storing the contents of files on disk. If this page has been changed in memory it will eventually need to be written out to disk to synchronize the changes
  • Anonymous pages - Pages not backed by anything on disk. When a program requests memory to be allocated to perform computations or record information, the information resides in anonymous pages
The pages that get swapped are
  • Inactive Pages
  • Anonymous Pages
Tuning Swappiness

The following sysctls help in tuning the virtual memory

1) vm.swappiness

Swapping inactive pages means searching for inactive pages and unmapping them. So it consumes more cpu and disk resources than writing anonymous pages to disk.
In order to swap out inactive pages (memory pages with no active references), the kernel has no option but to walk the entire memory space which is quite expensive with large memory sizes.

To make the kernel to prefer swapping anonymous pages rather than the inactive pages, vm.swappiness is used. Increasing swappiness tells the kernel to swap out anonymous pages (memory pages with references that aren't linked to files; e.g. process stacks, buffers, etc.) which is a much cheaper operation because the kernel can look at the page table to determine where these pages are in memory.

The kernel will prefer to swap anonymous pages when:

% of memory mapped in page tables + swappiness >= 100

The default values of vm.swappiness is 60.

# sysctl vm.swappiness
vm.swappiness = 60


Higher the vm.swappiness value, the more the system will swap.  At 100, the kernel will always prefer to find inactive pages and swap them out.
A high swappiness value means that the kernel will be more apt to unmap mapped pages. A low swappiness value means the opposite, the kernel will be less apt to unmap mapped pages.

2) vm.swap_token_timeout

This is used to control how long a process is protected from paging when the system in thrashing. It is measured in seconds.

No comments:

Post a Comment