Table of Contents


Performance Tuning

There are many steps one can take to optimize FileCatalyst in a high speed environment. It is important to try some file transfers without making any configuration changes in FileCatalyst. This will help establish baseline results for further tests. Keep in mind that the performance of FileCatalyst depends entirely on your hardware and network configuration. System diagnostic tools such as "iperf" are highly recommended before any configuration changes are made in FileCatalyst. When using iperf, please make sure to test the performance of UDP (-u switch).

FileCatalyst has been optimized to work for speeds up to 1 Gbps out of the box on any standard fiber or copper based IP connection with latency of up to 250 ms and Packet Loss of less than 1%. If you are not achieving expected results under the standard conditions, please review your network and hardware configuration before you attempt to make any changes listed under this section.

This section attempts to address a few common scenarios encountered by clients, and seeing where improvements can be made to allow greater throughput.

Know Your System

Disk IO

Achieving high throughput transfers requires first and foremost that you have the physical ability to read/write to data at the speeds you are trying to transfer.

FileCatalyst HotFolder and Remote Server Admin now include features that allow you to test write speed of your system. Running a file IO test can help determine if your system has adequate bandwidth to write the data you are transferring to it.

If the IO speed is not what you expected, verify the following:

  1. Ensure that the user home directory (or HotFolder location) is pointing to the correct path and permission to read/write to the storage.
  2. If the mount point is a network drive (Samba, SAN), make sure that the networking infrastructure can support the capacity (i.e, Trying to use a 1 Gbps ISCSI mounted via a 100 Mbps switch)
Attaining disk speeds for >1 Gbps transfers

For attaining transfer speeds higher than 500 Mbps on a single connection, additional tools are included with FileCatalyst to help you tune your IO speeds. Two scripts exist (one that performs read IO tests, the other performs write IO tests), and both can be launched via command-line by the Server/HotFolder/CLI Client JAR files. Here is an example using the CLI JAR file:f

java -jar FileCatalystCL.jar -testReadIO

java -jar FileCatalystCL.jar -testIO
              

Note: The same command works with FileCatalystServer.jar or FileCatalystHotFolder.jar.

The scripts are aimed at profiling your particular system, and testing to see if there are additional settings which can improve performance. The scripts require you to answer series of questions to set up the test parameters. Here is an example output for the supplied WRITE test script in Windows:

C:\tmp>java -jar FileCatalystCL.jar -testIO
Entering Write TestIO. This will run a series of tests on the file system to
attempt to discover the optimal values for # of writer threads and write block
size for your system.The test is both IO intensive and CPU intensive. Please
give the test adequate time to complete.

Please enter the drive/path you wish to test (ie:  C:/ or /mnt/data/ ):  c:/temp
  File to be written:  c:/temp/test.io

Please enter the size of file you wish to write (in MB, default 500):  10000
  File size:  10000MB.

Please enter the timeout length (secs) per run (default 60 secs): 180
  Timeout:  180 seconds.

Please enter the number of runs to perform for each setting (default 5): 3
  Number of runs per iteration:  3

Test if buffer size used in writes to disk affect performance.
Please enter a buffer array to attempt (default:  '64,128,256,512,1024,2048,4096')
Size in KB, comma delimited: 4,16,64,256,1024,4096,16384
  8 Buffer Size values (KB):  4,16,64,256,1024,4096,16384.

Test if multiple writers offer performance benefit when saving a block to disk.
Please enter a writer thread array to attempt (default:  '1,2,4,6,8'): 1,2
  2 Thread values:  1,2.

How many files would you like to create concurrently for IO test (default 1)?
Note: The number of files will never exceed the number of writer threads during tests.
  1 
  Files will be created in the test:  1.

Test using Direct IO when allocating buffer space by Java (default true):  true
  Use DirectIO = true.

Mode used to open up files (rw/rws/rwd -- default rw):  rw
  Mode used = rw.
                
Results of a READ test on a Linux machine with 8 x SSD RAID 0 array:
Tests run with the following parameters:
        file:  /opt/tmp/test.io
        size:  10000000000
        timeout:  60000
        directIO:  true
        file mode:  rw
        Max # files to use:  1
  # of THREADS  |1      |2
Buffer size     +=======+=======+
4               |1431   |1282
16              |1722   |1614
64              |2059   |1748
256             |2239   |1933
1024            |2095   |2050
4096            |2078   |2048
16384           |1841   |1720
                

This test, for example, gives us a good indication that a read block size between 256-1024 KB works well, and that additional reader threads do not necessarily give a write performance boost. Note that if you are using on-the-fly compression, additional writer threads may help offload high CPU tasks to multiple cores on your machine.

Encryption and CPU Overhead

Certain features of the FileCatalyst Software are computationally intensive, and may bottleneck high-speed transfers.

Encryption of the Control Channel alone does not impose significant overhead. However, setting the FileCatalyst Server to use encryption on the Control and Data Channels do impose a performance (CPU) penalty.

For most clients, a typical client or server machine can handle transfer speeds of 100 Mbps. However, hitting higher speeds (>500 Mbps) becomes difficult if the Data Channel is encrypted, with transfer starting to become CPU-bound. This may be verified by running an Activity Monitor (i.e. "Task Manager" on Windows or "top" command on Linux) for your OS and checking the CPU usage while the transfer is occurring.

NOTE: AES Data Encryption is performed by the Sender threads. Increasing the # of sender threads to the # of cores on your machine allows the workload to be divided amongst multiple cores. AES Data Decryption is performed by Packet Processors. A modern processor can handle approximately 200-500 Mbps of encryption. It is recommended that the number of Packet Processors are increased per 500 Mbps you want to achieve in throughput for a single connection, or as many cores as your system has.

Depending on your workflow, another possible solution to overcome this would be to encrypt the data with an external application or process, before transferring it over the wire, and decrypt it once the data has reached its destination.

Compressing files can also be CPU intensive. Under most circumstances, the gain in network throughput outweighs the CPU overhead to zip up files before transfer, especially when files are text-based or other formats that can be highly compressed. However, compression should be avoided for most binary and media files, which are normally considered non-compressible. If compression and AES on the Data Channel are utilized, high-speed transfers are likely to be limited by the amount of computing power that can be assigned for each transfer.

Windows UDP packet limits for transmitter

Microsoft Windows uses different algorithms to process UDP packets based on the packet size. By default, packets of size 1024 or smaller use a fast method (pushed onto the network immediately), while larger packets (1025 or larger) are handled using a slower interrupt method which consumes less CPU (see: http://support2.microsoft.com/kb/235257 for more details). While these defaults are in place, if you configure Windows to use IP standard MTU 1500 byte packets, performance is significantly degraded. For this reason, FileCatalyst software limits packet size to 1024 bytes for all Windows senders.

You can, however, get faster performance on Windows by setting a Windows registry key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\AFD\Parameters\FastSendDatagramThreshold Setting this registry key tells Windows to use the fast method of sending UDP packets for larger packets as well. There is a script (enableFastDatagram.reg) included in the FileCatalyst application installation directory which can be run (requires administrative privileges to run and a reboot to take effect) to move the threshold from 1024 bytes to 2048 bytes. For optimum performance, it is recommended to run this script, then increase packet sizes to 1472 in FileCatalyst products on Windows platforms (provided the network supports standard MTU of 1500 packets).

Running the script undoFastDatagram.reg will remove the registry entry (also requires administrative privileges and a reboot).

Know Your Bandwidth

FileCatalyst HotFolder has the ability to test the bandwidth between itself and the FileCatalyst Server. It is recommended that this test should be run if you are unsure of the real network bandwidth.

If there is a network bottleneck between the client and the server, setting the bandwidth speed above the real network capacity may result in high level of dropped packets and slower overall throughput.

If the bandwidth is above 500 Mbps and the supporting disk IO can support these speeds, it is recommended that client application specify UDP sender streams to be 2 or greater. This allows more data to be pushed out of the system and provides higher maximum speeds.

If the bandwidth is above 4 Gbps and the supporting disk IO can support these speeds, it is recommended that client application specify 2 or more UDP receiver streams and 5 or more sender UDP receiver streams. This allows more data to be pulled from the system and provides higher maximum speeds.

Know Your Network Infrastructure

Once the bandwidth is known, other settings can now be adjusted to accommodate your specific network environment.

Compensate for Packet Loss

Performance can be further optimized if you know your levels of Packet Loss.

The FileCatalyst protocol performs error correction should some packets not arrive on the first attempt. However, retransmission of data blocks is much slower and latency-sensitive than the original sending of data. To compensate for this slower retransmission period, our algorithm employs multiple sender threads to send out the data, increasing the probability that at least one sender can send data at full rate while the other threads clean up any missing packets.

For high packet loss, it is recommended that more threads are deployed, each with a smaller block size. This smaller block size greatly decreases the possibility that one particular transfer finds all its sender threads in "retransmission" mode, slowing the throughput down.

Compensate for High Latency

To compensate for high latency, it is recommended that block sizes (the amount of data each sender thread can manage at one time) be increased, or that the amount of threads be increased. Each of these allows more data to be in-flight before a callback is initiated to see if packets must be resent.

There is a simple equation to know the minimum values required to saturate a particular line:

RTT * bandwidth <= numSenderThreads * blocksize

I.E:If the RTT is 250 ms, and your line speed is 100000 Kbps. numSenderThreads * blocksize should be greater than 25,000,000 bytes, or roughly 25 MB. Setting numSenderThreads set to 3 (default), you need a minimal block size of 8,333,333 bytes to keep the line always full or a little over 8MB.

To compensate for potential Packet Loss, it is recommended that the thread value be increased by at least 1 thread to ensure that one sender thread is always transmitting a new block of data.

Congestion Control: Aggression

Our UDP Protocol can be very friendly to other traffic sharing the network resource. It also has the ability to push out any other traffic and try to attain maximum line speed at all costs. Both of these are valid scenarios, and it is important to understand the impact this can have on your transfers.

  1. For typical shared land-line networks (where other traffic is also present), UDP transfers should enable Congestion Control. This allows the protocol to react to Packet Loss (possibly due to other TCP traffic) in a friendly manner, and slow down the rate of transfer to allow multiple traffic sources to share the line.
  2. Congestion Control Aggression value of 2 (low aggressive rate) enables the protocol to reacts fairly quickly. When other traffic is detected, transfer rates should come down quickly and play nice. The start rate should also be set to be the minimum speed which you know the line is capable of handling.
  3. For dedicated land-line networks, customers often turn on the Congestion Control, but set the aggression much higher (5, or halfway), with the start rate at half the known link speed. This allows the file transfer to ramp up quickly, and maintain high rates should other traffic try to share the line. The higher the aggressive rate, the more the protocol will block any other TCP/UDP traffic on the line.
  4. For improved line speed, set the network upload/download to known link capacity, minimizing the software from exceeding the network speed and causing continuous peak/valley from the Congestion Control.
  5. For maximum line speed, disabling Congestion Control removes most overhead from the UDP protocol. However, if the software exceeds the link capacity, any dropped packets due to network slowdown are amplified, increasing the risk of all sender threads entering retransmission mode increases (slows down transfer rate temporarily).
  6. For satellite transfers, Packet Loss could simply be caused by an object passing quickly between the dish and the satellite, or momentary electrical/solar interference. For these types of connections, customers often turn off Congestion Control altogether, preferring to power through any temporary loss of signal and not risk slowing down the connection.
  7. Additional strategies (such as Packet Loss based Congestion Control instead of RTT based congestion control) may be used for networks which drop UDP packets rather than queue them on over-saturated links.

Congestion Control: Types Of Congestion Control

FileCatalyst has three modes of Congestion Control that can be utilized. Each offers a different set of features that can overcome particular network conditions.

  • Disable Congestion Control: Transfers are sent out at prescribed speeds, with no consideration for any other network traffic on the link. Please do keep in mind that this option may push out other TCP traffic, so take care to set the bandwidth correctly. This is best used on a dedicated line with known bandwidth.
  • Loss Based Congestion Control (Default): Changes in network latency, or Packet Loss, triggers the rate controller to slow down transfers. This option performs best on networks with no Packet Loss, and is very responsive to changes in network conditions.
  • RTT Based Congestion Control: Reacts to changes in network latency, which triggers the rate controller and slows down transfer speeds. This Congestion Control mode performs best on links with natural loss such as satellite or transoceanic links.

When attempting to configure a FileCatalyst application for your specific network infrastructure, please select the Congestion Control type that matches your current network infrastructure. By doing so, the Congestion Control algorithms will be optimized for your infrastructure, which should provide better performance.

Increase OS TCP/UDP Buffers and Window Sizes

Most operating systems allow the administrator to configure TCP/UDP parameters. Increasing the number of packets that may be in-flight may result in fewer lost packets, thereby increasing performance.

See knowledge base article: http://support.filecatalyst.com/index.php?/Knowledgebase/Article/View/243/0/how-does-udp-buffer-sizing-affect-performance-of-filecatalyst

NOTE: This is considered a requirement for any transfers higher than 150-200 Mbps.

Multiple Threads/Block Senders For Smoother Transfers

The FileCatalyst UDP algorithm is designed to take advantage of multiple sender threads in order to maximize the link speed. Forcing a single thread to send data across, tends to give "jumpy" bandwidth on links with higher Packet Loss and higher Latency. For cases like this, it may be beneficial to increase the number of threads for a given transfer. By increasing this value, the "jumpy" bandwidths may smooth out, and provide a much more consistent utilization of the bandwidth for a given transfer.

Utilize Jumbo Frames If The Network Allows.

For reaching speeds beyond 1 Gbps, it is highly recommended that jumbo frames are utilized. By using larger Packet Sizes (MTU = 9000) FileCatalyst is able to reduce the amount of calls it has to make to the OS, and greatly reduce the CPU load of the system.

Increasing MTU Values On Ubuntu

While there is many ways to modify the MTU values on Ubuntu or other Linux operating systems, the easiest way is to modify the file found at "/etc/network/interfaces". When editing this file the entries should look similar to the following example:

# interfaces(5) file used by ifup(8) and ifdown(8) 
auto lo 
iface lo inet loopback 
 
auto eth2 
iface eth2 inet static
		address 10.1.1.97         
		netmask 255.255.255.0
		network 10.1.1.0
		mtu 9000 
              

Note: Remember that the UDP Packet Size must always be set to a value that is less than the MTU minus 28 bytes to allow headers. Thus, in a standard Linux environment with MTU of 1500 Bytes, the application packet size should be set to 1472. For jumbo frames of 9000 bytes, application packet size should be 8972.

Know Your Data

FileCatalyst products support many tools to decrease the amount of data that is required to be sent across the network. The exact combination of features to utilize and increase file transfer performance depends on the nature of the data being sent across the network.

Compress Text Files

FileCatalyst clients allow files to be compressed before being sent out. Transferring of text files (logs, emails, etc) should be compressed before being transferred. The CPU overhead of compression is negligible compared to the bandwidth savings of compressing an ASCII based file. It is recommended that Progressive Transfers are enabled when compression is used. This allows the file to be sent over even as it is being compressed, reducing the overall transfer time.

Media files (audio, video, images) however do not compress easily, and often add intensive computing overhead to a transfer while giving no bandwidth savings. Ensure that non-compressible files are filtered out by using the file compression filter.

Delta File Transfer

If the same file is sent out more than once (perhaps with only a small modification to the file), enabling Incremental Delta Transfers will significantly reduce the bandwidth involved in sending a file.

For large files, it is also recommended to turn on Progressive Transfers. This allows the file deltas to be transferred, even as they are being built on the FileCatalyst Server, reducing the time it takes for the file to be transferred.

If the same file contains entirely new data each time (example: logs truncated on daily basis, and uploaded using the same file name), it is recommended to disable incremental transfers and simply start to transfer the file as new (avoid the computation required to determine what is different in the file, as all of it has changed).

Disable Force Flush

By default, write operations automatically flush both data and metadata to file system for all IO calls. Disabling force flush allows Java to utilizes the Operating System's file cache (RAM).

Growing Files

In situations where growing files need to be transferred, it is recommended that Progressive Transfers are enabled to improve overall performance. By enabling this option for a transfer, the FileCatalyst application transfers the file while it grows on the file system. Once the individual transfer completes, the application will monitor the files to see if they change in size within a given time threshold. Should the file change within this threshold, the modifications will be transferred, and the application will return to monitoring the file system for new changes.

The following option for Dynamic Files increases performance by consistently transferring the file contents while it is being built. By transferring the files as they grow, the amount of time that is needed for the completed file to be present on the destination is reduced.

For more detailed information regarding Progressive Transfers, visit the help documentation for the FileCatalyst HotFolder application

Transfer Cache

If your data is constructed in a manner where new files are continuously being added into existing file set, it is recommended that you enable the Transfer Cache feature for your transfers. By enabling this option, transfers will only upload/download files that haven't already been sent or modified since they were last transferred by the application. This option reduces the amount of data being transferred, by making it so only changed or new files, are transferred by the system.

Transferring A Large Collection Of Small Files

Each file transfer, regardless of the file size, requires overhead to set up a network connection between the client and the server. For large files, this overhead has minimal impact, as the software will spend most of the time transferring data across the wire. For transfers of small individual files, however, the reverse is true: the software can spend most of the time setting up and tearing down a connection, only to transfer a directory full of many small files. To help reduce the connection overhead, there are two main ways that you can improve performance for a given transfer. The first way to enable Single Archive Compression of small files and the second option is to use Multi-Client instead of Single-Client for transfers.

By enabling the ‘the Single Archive Compression’ option, the application will compile all available small files into a single zip archive before it is transmitted. By performing this operation, the application now spends all its time transferring data instead of managing setup and tear down of network connections. Enabling Progressive Transfers will also allow the application to start transferring while the single archive is being built.

  • Single Archive Note 1: There are caveats to Single Archive Compression transfers to keep in mind. Should a transfer be interrupted, auto-resume will not work, and restarting the transfer will start the archive and transfer from the beginning. Also, Delta File Transfers (incremental mode) does not work when Single Archive transfers are enabled.
  • Single Archive Note 2: It is also recommended to enable the "Max size" feature for Single Archives. This will create several smaller archives and extract them as they are transferred. If the transfer is interrupted, generally some of the files will have already been transferred. The maximum value for the Max Size parameter should not exceed 3 GB for any Single Archive.

You may also improve the performance of transferring large collections of small files by enabling the Multi-Client option available in client applications. In a normal transfer, a Single-Client may only transfer a single file at a given time, which means the application must sequentially process the connection overhead for every file transferred. Multi-Client is different in that it able to concurrently transfer files with each client that it has available. By transferring files concurrently, the connection overhead for files is also processed concurrently across the clients, which in turn, reduces the total wait time that a transfer experiences when it is setting up its connections.

  • Multi-Client Note: Multi-Client also supports an option titled "Auto-Archiving". Similar to the behaviour found in "Single Archive with Compression", Auto-Archiving takes small files less than 10 MB and compiles them into a zip archive with max size of 250 MB, before it is transferred. This option improves performance in large filesets with small file sizes by reducing the number of times the connection overhead is triggered.

Know the Client Endpoint

Low Bandwidth Clients

For clients connecting with low bandwidth, ensure that the Congestion Control start rate is set below the maximum line speed. By default, the value for the start rate is 384 Kbps. If the Congestion Control rate is still higher than the available bandwidth, the application will quickly flood the network connection, producing lost/dropped packets and forcing the sender threads to perform most of its transfer in retransmission mode (slower).

For very low bandwidth clients and low latency transfers, FTP mode can at times perform faster transfer rates than UDP.

Using our online comparison calculator, you can see if there is an advantage to using FTP rather than UDP. Note, however, that the values in the online calculator do not take high packet loss into account. TCP at high latency is extremely sensitive to even 0.1% packet loss, while our UDP protocol handles this much more gracefully.


High Load Servers (or many clients)

Transferring UDP at high speed is memory intensive. Using the example listed above, a 250 ms transfer (LA to Hong Kong) at 100 000 Kbps requires a minimum of 25 MB of data to be in-flight at any one time.

For a standard install of FileCatalyst Server the default memory allocated is 1 GB. This means the system can likely accept around 40 concurrent connections at full speed before starting to run short of memory resources.

The key to getting the FileCatalyst Server, to accept higher numbers of concurrent connections is to lower the memory footprint each connection can take on the system. The FileCatalyst Server has an option via the Server Remote Administration to optimize for concurrency. This will lower the maximum thread count and block size each client connection consumes, resulting in the server able to accept more clients at any one time.

If the maximum sender thread and maximum block size are set too low, limiting clients from connecting at full bandwidth, the following may be done:

Limiting Concurrent Connections

The FileCatalyst Server allows setting the Maximum Concurrent User Connections to levels below what the license allows. This allows administrators the possibility of setting the memory footprint high for each client, but limiting the connections to ensure that the FileCatalyst Server does not itself start to run short of memory.

Increase Java Heap Size

The optimizations performed for high-load servers are based upon your existing available JAVA heap size. By increasing the memory that Java has access to, the number of threads or block sizes each transfer that may use is increased, which can increase the performance of the application under certain circumstances.

Windows Service

For FileCatalyst HotFolders running as a service, these may be configured in fchf.conf:

# Java Additional Parameters
# original: wrapper.java.additional.1=-XX:MaxDirectMemorySize=1024M
wrapper.java.additional.1=-XX:MaxDirectMemorySize=1536M

# Initial Java Heap Size (in MB)
# original: wrapper.java.initmemory=1024
wrapper.java.initmemory=2048

# Maximum Java Heap Size (in MB)
# original: wrapper.java.maxmemory=1024
wrapper.java.maxmemory=2048

Note: When using a 32-bit verison of Java, the memory available to a Java service can be increased to 1.5 GB (1536). You cannot currently modify the heap size in the Windows executable found in the Start Menu.

Linux Command Line

For FileCatalyst HotFolders started at the command line, the starting scripts may be modified to use alternative memory sizes than the standard 1GB:

fc_hotfolder.sh

# original: java -Xms64M -Xmx1024M -XX:MaxDirectMemorySize=512M -jar FileCatalystHotFolder.jar
-controlpanel & java -Xms256M -Xmx2048M -XX:MaxDirectMemorySize=1024M -jar FileCatalystHotFolder.jar 
-controlpanel &
Linux Service

For FileCatalyst HotFolders running as a service in Linux, these may be set in <application directory>/conf/wrapper.conf

# Java Additional Parameters
# original: wrapper.java.additional.1=-XX:MaxDirectMemorySize=1G
wrapper.java.additional.1=-XX:MaxDirectMemorySize=1536M

# Initial Java Heap Size (in MB)
# original: wrapper.java.initmemory=1024
wrapper.java.initmemory=1536

# Maximum Java Heap Size (in MB)
# original: wrapper.java.maxmemory=1024
wrapper.java.maxmemory=2048

NOTE: When using a 32-bit verison of Java, the memory available to a Java service can be increased to 1.5 GB (1536). In 64-bit versions of Java, the memory can be increased to a maximum of 4 GB (4096)

Tuning Guide for Networks Faster than 2 Gbps

The FileCatalyst High Speed Transfer Tuning documentation and guide is available here.

Note: The link to the FileCatalyst High Speed Transfer Tuning documentation will require login credentials. Please log into the Support Portal (http://support.filecatalyst.com/) and click the "Get Download Password" Button. If you do not see the "Get Download Password Button" please submit a ticket to the Support Team.

Support

Support System

Visit our support website at http://support.filecatalyst.com to view the knowledge base and to submit a ticket (Available 24/7).

Chat and Phone Support

Live Chat: Visit our website at http://www.filecatalyst.com (Available 9 AM - 5 PM Eastern Time)

Phone: +1(613) 667-2439 (Available 9 AM - 5 PM Eastern Time)