Wednesday, March 31, 2010

How to test Multicast Packet Filtering?

Today morning I started my day testing with Multicast Packet Filtering. I had Ubuntu 9.10 and CentOS 5.4 in my VM running over ESX 4.1.I decided to make CentOS as Client and Ubuntu as server for the setup.

First of all, I downloaded iperf as yum was not feeling well today(just kidding).
Its simple to install,download the package and install. Luckily, I dint get any dependency hell.

On Ubuntu Box:

SERVER MACHINE
=====================================

sudo iperf -s -u -B 224.0.65.68 -i 1
--------------------------------------

server listening on UDP port 5001
Binding to local address 224.0.65.68
Joining multicast group 224.0.65.68
UDP Buffer size: 120 Kbytes(default)
------------------------------------------

[ 3] local 224.0.65.688 port 5001 connected with 10.112.173.86 port 38577
[ 3] 0.0.-1.0 sec 128 KBytes 1.05 Mbits/sec 0.228 ms 0/ 89 (0%)
...
...

On CenTOS Box(Client):



iperf -c 224.0.65.68 -u -T 5 -t 5
----------------------------------
Client connecting to 224.0.65.68, UDP port 5001
sending 1470 byte datagrams
setting multicast TTL to 5
UDP buffer size: 126 KBytes(default)
------------------------------------------
[ 3] local 10.112.173.86 port 38577 connected with 224.0.65.68 port 5001
[ 3] 0.0- 5.0 sec 642 Kbytes 1.05 Mbits/sec
[ 3] Sent 447 datagrams


1.It clearly shows that multicast address is 224.0.65.68.
e.g. server> iperf -s -u -B -i 1

This will have the iperf servers listening for datagram (-u) from the address (-B multicast address), with a periodic interval of 1s (-i 1)

2. Configure the client VM, connecting to the multicast group address and setting the TTL (-T, --ttl) as needed
e.g. client> iperf -c -u -T 5 -t 5

This will have the client connected to the multicast address (-c multicast address), with a TTL of 5 (-T 5), sending data for 5 seconds
NOTE: Use tcpdump or ethereal on server VMs to capture and analyze IP packets and ensure its validity.

Run the test for 120 sec.

That's it..
You Have just finally tested Multicast packet Filtering.

Have a Cool Rainy Weather in Bangalore.

Tuesday, March 30, 2010

Quick Command Reference: List Loaded Drivers on Linux?

driverquery is the command on Windows to get list of drivers. In the same way, on Linux we have lsmod which lists the drivers installed on the box.

lsmod is a program to show the status of modules in the Linux Kernel. lsmod is a trivial program which nicely formats the contents of the /proc/modules, showing what kernel modules are currently loaded.

Example: lsmod of a typical ESX 4.1 Box could show:

[root@esx]# lsmod
Module Size Used by
nfs 245688 2
lockd 68016 2 nfs
nfs_acl 3904 1 nfs
edd 10696 0
ppdev 10056 0
parport_pc 28584 0
i2c_dev 10696 0
i2c_core 23128 1 i2c_dev
sunrpc 162248 11 nfs,lockd,nfs_acl
ipt_REJECT 6080 0
xt_tcpudp 3520 0
ipt_LOG 6656 0
x_tables 17096 3 ipt_REJECT,xt_tcpudp,ipt_LOG
parport 41100 2 ppdev,parport_pc
nvram 8456 0
sg 36520 0
vmxnet_console 23360 1
vmnixmod 789052 56 vmxnet_console

Sunday, March 28, 2010

Understanding Jumbo Frames !!

Whether or not Gigabit Ethernet (and beyond) should support frame sizes (i.e. packets) larger than 1500 bytes has been a topic of great debate. With the explosive growth of Gigabit ethernet, the impact of this decision is critically important and will affect Internet performance for years to come.

Most of the debate about jumbo frames has focused on local area network performance and the impact that frame size has on host processing requirements, interface cards, memory, etc. But what is less well known, and of critical concern for high performance computing, is the impact that frame size has on wide area network performance. This document discusses why you should care, and about the largely ignored but important impact that frame size has on the wide area performance of TCP.

How jumbo is a jumbo frame anyway?

Ethernet has used 1500 byte frame sizes since it was created (around 1980). To maintain backward compatibility, 100 Mbps ethernet used the same size, and today "standard" gigabit ethernet is also using 1500 byte frames. This is so a packet to/from any combination of 10/100/1000 Mbps ethernet devices can be handled without any layer two fragmentation or reassembly.

"Jumbo frames" extends ethernet to 9000 bytes. Why 9000? First because ethernet uses a 32 bit CRC that loses its effectiveness above about 12000 bytes. And secondly, 9000 was large enough to carry an 8 KB application datagram (e.g. NFS) plus packet header overhead. Is 9000 bytes enough? It's a lot better than 1500, but for pure performance reasons there is little reason to stop there. At 64 KB we reach the limit of an IPv4 datagram, while IPv6 allows for packets up to 4 GB in size. For ethernet however, the 32 bit CRC limit is hard to change, so don't expect to see ethernet frame sizes above 9000 bytes anytime soon.

How can jumbo frames and 1500 byte frames coexist?

Two basic approaches exist:

* On a port by port basis, where everything "downstream" from a given port is known to support jumbo frames.
* Using 802.1q Virtual LANs, where jumbo frame and non-jumbo frame devices are segregated to different VLANs.

Jumbo frames bad for multimedia?

For applications that are sensitive to burst drops, delay jitter, etc., it can be argued that large frames are a bad idea. No application has to use large frames however, so the question is really whether other application's large frames will negatively impact your application's small ones. This is primarily an issue of slot time, i.e. how much will a large packet delay (or quantize) the time(s) available to transmit the small packets.

A 9000 byte GigE packet takes the same amount of time to transmit as a 900 byte fast ethernet packet or a 90 byte 10 Mbps ethernet packet. So jumbo frames on gigabit ethernet at worse add less delay variation than 1500 byte frames do on slower ethernets. And no one is suggesting that slower ethernets use 9000 byte frames. As for queueing delay concerns, that could happen whether packets are large or small. If delivery QoS is required, then the routers need to implement some kind of priority or expedited forwarding, regardless of the packet sizes. Tiny frames (including 53 byte ATM cells) may be helpful when multiplexing lower bit rate streams, but they become increasingly ridiculous on gigabit and beyond links.

Understanding iperf?

Iperf is a tool to measure the bandwidth and the quality of a network link. Jperf can be associated with Iperf to provide a graphical frontend written in Java.

The network link is delimited by two hosts running Iperf.

The quality of a link can be tested as follows:

- Latency (response time or RTT): can be measured with the ping command.
- Jitter (latency variation): can be measured with an Iperf UDP test.
- Datagram loss: can be measured with an Iperf UDP test.

The bandwidth is measured through TCP tests.

To be clear, the difference between TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) is that TCP use processes to check that the packets are correctly sent to the receiver whereas with UDP the packets are sent without any checks but with the advantage of being quicker than TCP.
Iperf uses the different capacities of TCP and UDP to provide statistics about network links.

Finally, Iperf can be installed very easily on any UNIX/Linux or Microsoft Windows system. One host must be set as client, the other one as server.


Here is a diagram where Iperf is installed on a Linux and Microsoft Windows machine.
Linux is used as the Iperf client and Windows as the Iperf server. Of course, it is also possible to use two Linux boxes.

screenshot Iperf bandwidth measure client server

Iperf tests:

no arg.
-b
-r
-d
-w
Default settings
Data format
Bi-directional bandwidth
Simultaneous bi-directional bandwidth
TCP Window size
-p, -t, -i
-u, -b
-m
-M
-P
-h Port, timing and interval
UDP tests, bandwidth settings
Maximum Segment Size display
Maximum Segment Size settings
Parallel tests
help
Jperf:

no arg.
-d
-u, -b Default settings
Simultaneous bi-directional bandwidth
UDP tests, bandwidth settings


Default Iperf settings:
Also check "Jperf section.

By default, the Iperf client connects to the Iperf server on the TCP port 5001 and the bandwidth displayed by Iperf is the bandwidth from the client to the server.
If you want to use UDP tests, use the -u argument.
The -d and -r Iperf client arguments measure the bi-directional bandwidths. (See further on this tutorial)

Client side:

#iperf -c 10.1.1.1
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 5001
TCP window size: 16384 Byte (default)
------------------------------------------------------------
[ 3] local 10.6.2.5 port 33453 connected with 10.1.1.1 port 5001
[ 3] 0.0-10.2 sec 1.26 MBytes 1.05 Mbits/sec

Server side:

#iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[852] local 10.1.1.1 port 5001 connected with 10.6.2.5 port 33453
[ ID] Interval Transfer Bandwidth
[852] 0.0-10.6 sec 1.26 MBytes 1.03 Mbits/sec


Data formatting: (-f argument)

The -f argument can display the results in the desired format: bits(b), bytes(B), kilobits(k), kilobytes(K), megabits(m), megabytes(M), gigabits(g) or gigabytes(G).
Generally the bandwidth measures are displayed in bits (or Kilobits, etc ...) and an amount of data is displayed in bytes (or Kilobytes, etc ...).
As a reminder, 1 byte is equal to 8 bits and, in the computer science world, 1 kilo is equal to 1024 (2^10).
For example: 100'000'000 bytes is not equal to 100 Mbytes but to 100'000'000/1024/1024 = 95.37 Mbytes.

Client side:

#iperf -c 10.1.1.1 -f b
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 5001
TCP window size: 16384 Byte (default)
------------------------------------------------------------
[ 3] local 10.6.2.5 port 54953 connected with 10.1.1.1 port 5001
[ 3] 0.0-10.2 sec 1359872 Bytes 1064272 bits/sec

Server side:

#iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[852] local 10.1.1.1 port 5001 connected with 10.6.2.5 port 33453
[ ID] Interval Transfer Bandwidth
[852] 0.0-10.6 sec 920 KBytes 711 Kbits/sec

Top of the page


Bi-directional bandwidth measurement: (-r argument)

The Iperf server connects back to the client allowing the bi-directional bandwidth measurement. By default, only the bandwidth from the client to the server is measured.
If you want to measure the bi-directional bandwidth simultaneously, use the -d keyword. (See next test.)

Client side:

#iperf -c 10.1.1.1 -r
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 5] local 10.6.2.5 port 35726 connected with 10.1.1.1 port 5001
[ 5] 0.0-10.0 sec 1.12 MBytes 936 Kbits/sec
[ 4] local 10.6.2.5 port 5001 connected with 10.1.1.1 port 1640
[ 4] 0.0-10.1 sec 74.2 MBytes 61.7 Mbits/sec

Server side:

#iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[852] local 10.1.1.1 port 5001 connected with 10.6.2.5 port 54355
[ ID] Interval Transfer Bandwidth
[852] 0.0-10.1 sec 1.15 MBytes 956 Kbits/sec
------------------------------------------------------------
Client connecting to 10.6.2.5, TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[824] local 10.1.1.1 port 1646 connected with 10.6.2.5 port 5001
[ ID] Interval Transfer Bandwidth
[824] 0.0-10.0 sec 73.3 MBytes 61.4 Mbits/sec

Top of the page


Simultaneous bi-directional bandwidth measurement: (-d argument)
Also check the "Jperf" section.

To measure the bi-directional bandwidths simultaneousely, use the -d argument. If you want to test the bandwidths sequentially, use the -r argument (see previous test).
By default (ie: without the -r or -d arguments), only the bandwidth from the client to the server is measured.

Client side:

#iperf -c 10.1.1.1 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 5] local 10.6.2.5 port 60270 connected with 10.1.1.1 port 5001
[ 4] local 10.6.2.5 port 5001 connected with 10.1.1.1 port 2643
[ 4] 0.0-10.0 sec 76.3 MBytes 63.9 Mbits/sec
[ 5] 0.0-10.1 sec 1.55 MBytes 1.29 Mbits/sec

Server side:

#iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[852] local 10.1.1.1 port 5001 connected with 10.6.2.5 port 60270
------------------------------------------------------------
Client connecting to 10.6.2.5, TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[800] local 10.1.1.1 port 2643 connected with 10.6.2.5 port 5001
[ ID] Interval Transfer Bandwidth
[800] 0.0-10.0 sec 76.3 MBytes 63.9 Mbits/sec
[852] 0.0-10.1 sec 1.55 MBytes 1.29 Mbits/sec

Top of the page


TCP Window size: (-w argument)

The TCP window size is the amount of data that can be buffered during a connection without a validation from the receiver.
It can be between 2 and 65,535 bytes.
On Linux systems, when specifying a TCP buffer size with the -w argument, the kernel allocates double as much as indicated.

Client side:

#iperf -c 10.1.1.1 -w 2000
WARNING: TCP window size set to 2000 bytes. A small window size
will give poor performance. See the Iperf documentation.
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 5001
TCP window size: 3.91 KByte (WARNING: requested 1.95 KByte)
------------------------------------------------------------
[ 3] local 10.6.2.5 port 51400 connected with 10.1.1.1 port 5001
[ 3] 0.0-10.1 sec 704 KBytes 572 Kbits/sec

Server side:

#iperf -s -w 4000
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 3.91 KByte
------------------------------------------------------------
[852] local 10.1.1.1 port 5001 connected with 10.6.2.5 port 51400
[ ID] Interval Transfer Bandwidth
[852] 0.0-10.1 sec 704 KBytes 570 Kbits/sec

Top of the page


Communication port (-p), timing (-t) and interval (-i):

The Iperf server communication port can be changed with the -p argument. It must be configured on the client and the server with the same value, default is TCP port 5001.
The -t argument specifies the test duration time in seconds, default is 10 secs.
The -i argument indicates the interval in seconds between periodic bandwidth reports.

Client side:

#iperf -c 10.1.1.1 -p 12000 -t 20 -i 2
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 12000
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.6.2.5 port 58316 connected with 10.1.1.1 port 12000
[ 3] 0.0- 2.0 sec 224 KBytes 918 Kbits/sec
[ 3] 2.0- 4.0 sec 368 KBytes 1.51 Mbits/sec
[ 3] 4.0- 6.0 sec 704 KBytes 2.88 Mbits/sec
[ 3] 6.0- 8.0 sec 280 KBytes 1.15 Mbits/sec
[ 3] 8.0-10.0 sec 208 KBytes 852 Kbits/sec
[ 3] 10.0-12.0 sec 344 KBytes 1.41 Mbits/sec
[ 3] 12.0-14.0 sec 208 KBytes 852 Kbits/sec
[ 3] 14.0-16.0 sec 232 KBytes 950 Kbits/sec
[ 3] 16.0-18.0 sec 232 KBytes 950 Kbits/sec
[ 3] 18.0-20.0 sec 264 KBytes 1.08 Mbits/sec
[ 3] 0.0-20.1 sec 3.00 MBytes 1.25 Mbits/sec

Server side:

#iperf -s -p 12000
------------------------------------------------------------
Server listening on TCP port 12000
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[852] local 10.1.1.1 port 12000 connected with 10.6.2.5 port 58316
[ ID] Interval Transfer Bandwidth
[852] 0.0-20.1 sec 3.00 MBytes 1.25 Mbits/sec

Top of the page


UDP tests: (-u), bandwidth settings (-b)
Also check the "Jperf" section.

The UDP tests with the -u argument will give invaluable information about the jitter and the packet loss. If you don't specify the -u argument, Iperf uses TCP.
To keep a good link quality, the packet loss should not go over 1 %. A high packet loss rate will generate a lot of TCP segment retransmissions which will affect the bandwidth.

The jitter is basically the latency variation and does not depend on the latency. You can have high response times and a very low jitter. The jitter value is particularly important on network links supporting voice over IP (VoIP) because a high jitter can break a call.
The -b argument allows the allocation if the desired bandwidth.

Client side:

#iperf -c 10.1.1.1 -u -b 10m
------------------------------------------------------------
Client connecting to 10.1.1.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 108 KByte (default)
------------------------------------------------------------
[ 3] local 10.6.2.5 port 32781 connected with 10.1.1.1 port 5001
[ 3] 0.0-10.0 sec 11.8 MBytes 9.89 Mbits/sec
[ 3] Sent 8409 datagrams
[ 3] Server Report:
[ 3] 0.0-10.0 sec 11.8 MBytes 9.86 Mbits/sec 2.617 ms 9/ 8409 (0.11%)

Server side:

#iperf -s -u -i 1
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 8.00 KByte (default)
------------------------------------------------------------
[904] local 10.1.1.1 port 5001 connected with 10.6.2.5 port 32781
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[904] 0.0- 1.0 sec 1.17 MBytes 9.84 Mbits/sec 1.830 ms 0/ 837 (0%)
[904] 1.0- 2.0 sec 1.18 MBytes 9.94 Mbits/sec 1.846 ms 5/ 850 (0.59%)
[904] 2.0- 3.0 sec 1.19 MBytes 9.98 Mbits/sec 1.802 ms 2/ 851 (0.24%)
[904] 3.0- 4.0 sec 1.19 MBytes 10.0 Mbits/sec 1.830 ms 0/ 850 (0%)
[904] 4.0- 5.0 sec 1.19 MBytes 9.98 Mbits/sec 1.846 ms 1/ 850 (0.12%)
[904] 5.0- 6.0 sec 1.19 MBytes 10.0 Mbits/sec 1.806 ms 0/ 851 (0%)
[904] 6.0- 7.0 sec 1.06 MBytes 8.87 Mbits/sec 1.803 ms 1/ 755 (0.13%)
[904] 7.0- 8.0 sec 1.19 MBytes 10.0 Mbits/sec 1.831 ms 0/ 850 (0%)
[904] 8.0- 9.0 sec 1.19 MBytes 10.0 Mbits/sec 1.841 ms 0/ 850 (0%)
[904] 9.0-10.0 sec 1.19 MBytes 10.0 Mbits/sec 1.801 ms 0/ 851 (0%)
[904] 0.0-10.0 sec 11.8 MBytes 9.86 Mbits/sec 2.618 ms 9/ 8409 (0.11%)

Top of the page


Maximum Segment Size (-m argument) display:

The Maximum Segment Size (MSS) is the largest amount of data, in bytes, that a computer can support in a single, unfragmented TCP segment.
It can be calculated as follows:
MSS = MTU - TCP & IP headers
The TCP & IP headers are equal to 40 bytes.
The MTU or Maximum Transmission Unit is the greatest amount of data that can be transferred in a frame.
Here are some default MTU size for different network topology:
Ethernet - 1500 bytes: used in a LAN.
PPPoE - 1492 bytes: used on ADSL links.
Token Ring (16Mb/sec) - 17914 bytes: old technology developed by IBM.
Dial-up - 576 bytes

Generally, a higher MTU (and MSS) brings higher bandwidth efficiency

Client side:

#iperf -c 10.1.1.1 -m
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.6.2.5 port 41532 connected with 10.1.1.1 port 5001
[ 3] 0.0-10.2 sec 1.27 MBytes 1.04 Mbits/sec
[ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

Here the MSS is not equal to 1500 - 40 but to 1500 - 40 - 12 (Timestamps option) = 1448

Server side:

#iperf -s
Top of the page


Maximum Segment Size (-M argument) settings:

Use the -M argument to change the MSS. (See the previous test for more explanations about the MSS)

#iperf -c 10.1.1.1 -M 1300 -m
WARNING: attempt to set TCP maximum segment size to 1300, but got 536
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.6.2.5 port 41533 connected with 10.1.1.1 port 5001
[ 3] 0.0-10.1 sec 4.29 MBytes 3.58 Mbits/sec
[ 3] MSS size 1288 bytes (MTU 1328 bytes, unknown interface)

Server side:

#iperf -s
Top of the page


Parallel tests (-P argument):

Use the -P argument to run parallel tests.

Client side:

#iperf -c 10.1.1.1 -P 2
------------------------------------------------------------
Client connecting to 10.1.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.6.2.5 port 41534 connected with 10.1.1.1 port 5001
[ 4] local 10.6.2.5 port 41535 connected with 10.1.1.1 port 5001
[ 4] 0.0-10.1 sec 1.35 MBytes 1.12 Mbits/sec
[ 3] 0.0-10.1 sec 1.35 MBytes 1.12 Mbits/sec
[SUM] 0.0-10.1 sec 2.70 MBytes 2.24 Mbits/sec

Server side:

#iperf -s
Top of the page


Iperf help:

#iperf -h
Usage: iperf [-s|-c host] [options]
iperf [-h|--help] [-v|--version]

Client/Server:
-f
-i
-l
-m
-p
-u
-w
-B
-C
-M
-N
-V --format
--interval
--len
--print_mss
--port
--udp
--window
--bind
--compatibility
--mss
--nodelay
--IPv6Version [kmKM]
#
#[KM]

#

#[KM]
"host"

#

format to report: Kbits, Mbits, KBytes, MBytes
seconds between periodic bandwidth reports
length of buffer to read or write (default 8 KB)
print TCP maximum segment size (MTU - TCP/IP header)
server port to listen on/connect to
use UDP rather than TCP
TCP window size (socket buffer size)
bind to "host", an interface or multicast address
for use with older versions does not sent extra msgs
set TCP maximum segment size (MTU - 40 bytes)
set TCP no delay, disabling Nagle's Algorithm
Set the domain to IPv6
Server specific:
-s
-U
-D --server
--single_udp
--daemon

run in server mode
run in single threaded UDP mode
run the server as a daemon
Client specific:
-b
-c
-d
-n
-r
-t
-F
-I
-L
-P
-T --bandwidth
--client
--dualtest
--num
--tradeoff
--time
--fileinput
--stdin
--listenport
--parallel
--ttl #[KM]
"host"

#[KM]

#
"name"

#
#
# for UDP, bandwidth to send at in bits/sec (default 1 Mbit/sec, implies -u)
run in client mode, connecting to "host"
Do a bidirectional test simultaneously
number of bytes to transmit (instead of -t)
Do a bidirectional test individually
time in seconds to transmit for (default 10 secs)
input the data to be transmitted from a file
input the data to be transmitted from stdin
port to recieve bidirectional tests back on
number of parallel client threads to run
time-to-live, for multicast (default 1)
Miscellaneous:
-h
-v --help
--version
print this message and quit
print version information and quit

Demystify Your Linux Box !!: vmxnet3 :A New Para-Virtualized NIC from Vmware

Demystify Your Linux Box !!: vmxnet3 :A New Para-Virtualized NIC from Vmware

vmxnet3 :A New Para-Virtualized NIC from Vmware

VMXNET3, the newest generation of virtual network adapter from VMware, offers performance on par with or better than its previous generations in both Windows and Linux guests. Both the driver and the device have been highly tuned to perform better on modern systems. Furthermore, VMXNET3 introduces new features and enhancements, such as TSO6 and RSS.

TSO6 makes it especially useful for users deploying applications that deal with IPv6 traffic, while RSS is helpful for deployments requiring high scalability. All these features give VMXNET3 advantages that are not possible with previous generations of virtual network adapters.
Moving forward, to keep pace with an ever‐increasing demand for network bandwidth, Vmware recommend customers migrate to VMXNET3 if performance is of top concern to their deployments.

The VMXNET3 driver is NAPI‐compliant on Linux guests. NAPI is an interrupt mitigation mechanism that improves high‐speed networking performance on Linux by switching back and forth between interrupt mode and polling mode during packet receive. It is a proven technique to improve CPU efficiency and allows the
guest to process higher packet loads. VMXNET3 also supports Large Receive Offload (LRO) on Linux guests.However, in ESX 4.0 the VMkernel backend supports large receive packets only if the packets originate from another virtual machine running on the same host.

VMXNET3 supports larger Tx/Rx ring buffer sizes compared to previous generations of virtual network devices. This feature benefits certain network workloads with bursty and high‐peak throughput. Having a larger ring size provides extra buffering to better cope with transient packet bursts.

VMXNET3 supports three interrupt modes:

MSI‐X,
MSI, and
INTx.

Normally the VMXNET3 guest driver will attempt to use the interrupt modes in the order given above, if the guest kernel supports them. With VMXNET3, TCP Segmentation Offload (TSO) for IPv6 is supported for both Windows and Linux guests now, and TSO support for IPv4 is added for Solaris guests in addition to Windows and Linux guests.

To use VMXNET3, the user must install VMware Tools on a virtual machine with hardware version 7.

Friday, March 26, 2010

How to Upgrade to Fedora 12?

Last night I thought of upgrading my Fedora 11 to Fedora 12.I went through Fedora Official website and came across a new tool called preupgrade.It went fine and so wanted to share it with you all.

Here we go...

In most cases, the simplest way to upgrade an existing Fedora installation is with the preupgrade tool. When a new version of Fedora is available, preupgrade downloads the packages necessary to upgrade your installation, and initiates the upgrade process.

Install preupgrade with your graphical package manager, or

type yum install preupgrade at the command line and press Enter.

To run preupgrade, type preupgrade at the command line and press Enter.

Note:
If the contents of your /etc/fedora-release file have been changed from the default, your Fedora installation may not be found when attempting an upgrade to Fedora 12.
You can relax some of the checks against this file by booting with the following boot command:

linux upgradeany

Use the linux upgradeany command if your Fedora installation was not given as an option to upgrade.

To perform an upgrade, select Perform an upgrade of an existing installation. Click Next when you are ready to begin your upgrade.

To re-install your system, select Perform a new Fedora installation and refer to Chapter 6, Installing on Intel® and AMD Systems for further instructions.

Understanding /proc/cpuinfo?

A hyperthreaded processor has the same number of function units as an older, non-hyperthreaded processor. It just has two execution contexts, so it can maybe achieve better function unit utilization by letting more than one program execute concurrently. On the other hand, if you're running two programs which compete for the same function units, there is no advantage at all to having both running "concurrently." When one is running, the other is necessarily waiting on the same function units.

A dual core processor literally has two times as many function units as a single-core processor, and can really run two programs concurrently, with no competition for function units.

A dual core processor is built so that both cores share the same level 2 cache. A dual processor (separate physical cpus) system differs in that each cpu will have its own level 2 cache. This may sound like an advantage, and in some situations it can be but in many cases new research and testing shows that the shared cache can be faster when the cpus are sharing the same or very similar tasks.

In general Hyperthreading is considered older technology and is no longer supported in newer cpus. Hyperthreading can provide a marginal (10%) for some server workloads like mysql, but dual core technology has essentially replaced hyperthreading in newer systems.

A dual core cpu running at 3.0Ghz should be faster then a dual cpu (separate core) system running at 3.0Ghz due to the ability to share the cache at higher bus speeds.

The examples below details how we determine what kind of cpu(s) are present.

The kernel data Linux exposes in /proc/cpuinfo will show each logical cpu with a unique processor number. A logical cpu can be a hyperthreading sibling, a shared core in a dual or quad core, or a separate physical cpu. We must look at the siblings, cpu cores and core id to tell the difference.

If the number of cores = the number of siblings for a given physical processor, then hyperthreading is OFF.

/bin/cat /proc/cpuinfo | /bin/egrep 'processor|model name|cache size|core|sibling|physical'

Example 1: Single processor, 1 core, no Hyperthreading

processor : 0
model name : AMD Duron(tm) processor
cache size : 64 KB

Example 2: Single processor, 1 core, Hyperthreading is enabled.

Notice how we have 2 siblings, but only 1 core. The physical cpu id is the same for both: 0.

processor : 0
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
processor : 1
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1


Example 3. Single socket Quad Core

Notice how each processor has its own core id. The number of siblings matches the number of cores so there are no Hyperthreading siblings. Also notice the huge l2 cache - 6 MB. That makes sense though, when considering 4 cores share that l2 cache.

processor : 0
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
processor : 1
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
processor : 2
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
processor : 3
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4

Example 3a. Single socket Dual Core

Again, each processor has its own core so this is a dual core system.

processor : 0
model name : Intel(R) Pentium(R) D CPU 3.00GHz
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
processor : 1
model name : Intel(R) Pentium(R) D CPU 3.00GHz
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2

Example 4. Dual Single core CPU, Hyperthreading ENABLED

This example shows that processer 0 and 2 share the same physical cpu and 1 and 3 share the same physical cpu. The number of siblings is twice the number of cores, which is another clue that this is a system with hyperthreading enabled.

processor : 0
model name : Intel(R) Xeon(TM) CPU 3.60GHz
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
processor : 1
model name : Intel(R) Xeon(TM) CPU 3.60GHz
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
processor : 2
model name : Intel(R) Xeon(TM) CPU 3.60GHz
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
processor : 3
model name : Intel(R) Xeon(TM) CPU 3.60GHz
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1

Example 5. Dual CPU Dual Core No hyperthreading

Of the 5 examples this should be the most capable system processor-wise. There are a total of 4 cores; 2 cores in 2 separate socketed physical cpus. Each core shares the 4MB cache with its sibling core. The higher clock rate (3.0 Ghz vs 2.3Ghz) should offer slightly better performance than example 3.

processor : 0
model name : Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
processor : 1
model name : Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
processor : 2
model name : Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 2
processor : 3
model name : Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 1
cpu cores : 2

Friday, March 19, 2010

Shell Script: Quick Look into Command-Line Arguments

Following script is used to print command line argument and will show you how to access them:
$ vi demo

#!/bin/sh
#
# Script that demos, command line args
#
echo "Total number of command line argument are $#"
echo "$0 is script name"
echo "$1 is first argument"
echo "$2 is second argument"
echo "All of them are :- $* or $@"


Run it as follows

Set execute permission as follows:

$ chmod 755 demo

Run it & test it as follows:

$ ./demo Hello World

If test successful, copy script to your own bin directory (Install script for private use)
$ cp demo ~/bin

Check whether it is working or not (?)

$ demo
$ demo Hello World

NOTE: After this, for any script you have to used above command, in sequence, I am not going to show you all of the above command(s) for rest of Tutorial.

Shell Script: How to use GREP utility?

The grep command selects and prints lines from a file (or a bunch of files) that match a pattern. Let's say your friend Bill sent you an email recently with his phone number, and you want to call him ASAP to order some books. Instead of launching your email program and sifting through all the messages, you can scan your in-box file, like this:

The most useful grep flags are shown here:

-i Ignore uppercase and lowercase when comparing.
-v Print only lines that do not match the pattern.
-c Print only a count of the matching lines.
-n Display the line number before each matching line.

When grep performs its pattern matching, it expects you to provide a regular expression for the pattern. Regular expressions can be very simple or quite complex, so we won't get into a lot of details here. Here are the most common types of regular expressions:

abc Match lines containing the string "abc" anywhere.
^abc Match lines starting with "abc."
abc$ Match lines ending with "abc."
a..c Match lines containing "a" and "c" separated by any two characters (the dot matches any single character).
a.*c Match lines containing "a" and "c" separated by any number of characters (the dot- asterisk means match zero or more characters).


Regular expressions also come into play when using vi, sed, awk, and other Unix commands. If you want to master Unix, take time to understand regular expressions. Here is a sample poem.txt file and some grep commands to demonstrate regular-expression pattern matching:

Mary had a little lamb
Mary fried a lot of spam
Jack ate a Spam sandwich
Jill had a lamb spamwich

To print all lines containing spam (respecting uppercase and lowercase), enter

grep 'spam' poem.txt
Mary fried a lot of spam
Jill had a lamb spamwich

To print all lines containing spam (ignoring uppercase and lowercase), enter

grep -i 'spam' poem.txt
Mary fried a lot of spam
Jack ate a Spam sandwich
Jill had a lamb spamwich

To print just the number of lines containing the word spam (ignoring uppercase and lowercase), enter

grep -ic 'spam' poem.txt
3

To print all lines not containing spam (ignoring uppercase and lowercase), enter

grep -i -v 'spam' poem.txt
Mary had a little lamb

To print all lines starting with Mary, enter

grep '^Mary' poem.txt
Mary had a little lamb
Mary fried a lot of spam

To print all lines ending with ich, enter

grep 'ich$' poem.txt
Jack ate a Spam sandwich
Jill had a lamb spamwich

To print all lines containing had followed by lamb, enter

grep 'had.*lamb' poem.txt
Mary had a little lamb
Jill had a lamb spamwich

Shell Script: A Simple Cut Command

Today is a sunny day outside and lets tweak with shell scripting.
We will carry on this episode throughout this year. I can assure you will surely be interested with this new episode.

Lets start it from scratch:

Consider a slight variation on the company.data file we've been playing with in this section:

406378:Sales:Itorre:Jan
031762:Marketing:Nasium:Jim
636496:Research:Ancholie:Mel
396082:Sales:Jucacion:Ed


If you want to print just columns 1 to 6 of each line (the employee serial numbers), use the -c1-6 flag, as in this command:

cut -c1-6 company.data
406378
031762
636496
396082

If you want to print just columns 4 and 8 of each line (the first letter of the department and the fourth digit of the serial number), use the -c4,8 flag, as in this command:

cut -c4,8 company.data
3S
7M
4R
0S

And since this file obviously has fields delimited by colons, we can pick out just the last names by specifying the -d: and -f3 flags, like this:

cut -d: -f3 company.data
Itorre
Nasium
Ancholie
Jucacion

Here is a summary of the most common flags for the cut command:

-c [n | n,m | n-m] Specify a single column, multiple columns (separated by a comma), or range of columns (separated by a dash).
-f [n | n,m | n-m] Specify a single field, multiple fields (separated by a comma), or range of fields (separated by a dash).
-dc Specify the field delimiter.
-s Suppress (don't print) lines not containing the delimiter.

Friday, March 12, 2010

Linux RAM Disk: Creating A Filesystem In RAM

Software RAM disks use the normal RAM in main memory as if it were a partition on a hard drive rather than actually accessing the data bus normally used for secondary storage such as hard disk. How do I create and store a web cache on a RAM disk to improve the speed of loading pages under Linux operating systems?

You can create the ram disk as follows (8192 = 8M, no need to format the ramdisk as a journaling file system) :

# mkfs -q /dev/ram1 8192
# mkdir -p /ramcache
# mount /dev/ram1 /ramcache
# df -H | grep ramcache

Sample outputs:

/dev/ram1 8.2M 1.1M 6.7M 15% /ramcacheNext you copy images or caching objects to /ramcache

# cp /var/www/html/images/*.jpg /ramcache
Now you can edit Apache or squid reverse proxy to use /ramcache to map to images.example.com:



ServerAdmin admin@example.com
ServerName images.example.com
DocumentRoot /ramcache
#ErrorLog /var/logs/httpd/images.example.com_error.log
#CustomLog /var/logs/httpd/images.example.com_access.log combined

Reload httpd:

# service httpd reload
Now all hits to images.example.com will be served from the ram. This can improve the speed of loading pages or images. However, if server rebooted all data will be lost. So you may want to write /etc/init.d/ script to copy back files to /ramcache. Create a script called initramcache.sh:

#!/bin/sh
mkfs -t ext2 -q /dev/ram1 8192
[ ! -d /ramcache ] && mkdir -p /ramcache
mount /dev/ram1 /ramcache
/bin/cp /var/www/html/images/*.jpg /ramcacheCall it from /etc/rc.local or create softlink in /etc/rc3.d/

# chmod +x /path/to/initramcache.sh
# echo '/path/to/initramcache.sh' >> /etc/rc.local

A Note About tmpfs
tmpfs is supported by the Linux kernel from version 2.4+. tmpfs (also known as shmfs) is a little different from the Linux ramdisk. It allocate memory dynamically and by allowing less-used pages to be moved onto swap space. ramfs, in contrast, does not make use of swap which can be an advantage or disadvantage in many cases. See how to use tmpfs under Linux.