Clustering

Cluster mode is the solution for high performance system. It offers Load Balancing and High Availability features.

A Platform cluster is a set of nodes that communicate via JGroups - UDP or TCP - in the back-end, and a front-end Load Balancer like Apache that distributes HTTP requests to the nodes. The High Availability is achieved in the data layer natively by the RDBMS or Shared File Systems, such as SAN and NAS.

The following diagram illustrates a cluster field with two nodes (each node uses its local JCR index storage, but you can enable shared JCR indexing, as described in the chapter).

image0

In this chapter:

Setting up eXo Platform cluster

  1. Install eXo Platform package by following Installation and Startup.

    If you are using eXo Chat addon, you should install it in all the cluster nodes.

  2. Create a copy of the package for each cluster node. Assume that you have two nodes: node1.your-domain.com and node2.your-domain.com.

Note

For testing or troubleshooting context, in case you are using Tomcat as application server and if you will run the cluster nodes in the same environment (same Operating System), you should configure different Tomcat ports.

  1. Configure the RDBMS datasources in each cluster node (follow this documentation) to use one of the supported database systems: Postgres, MySQL, MSSQL, Oracle, MariaDB.

Note

  • It is not possible to use the default configured hsql embedded database as noted in Configuring eXo Platform with database.
  • The different cluster nodes must use the same RDBMS datasources.
  1. eXo Platform comes with Elasticsearch embedded. For clustering, you MUST use a seperate Elasticsearch process. Please follow the steps described here.
  1. eXo Platform uses databases and a disk folders to store its data:

    • Datasources:

      • IDM: datasource to store user/group/membership entities.
      • JCR: datasource to store JCR Data.
      • JPA: datasource to store entities mapped by Hibernate. Quartz tables are stored in this datasource by default.
    • Disk:

      • File storage data: Stored by default under a file system folder and could be configured to store files in JPA datasource instead. More details here.

        If the file system storage implementation is configured, the folder must be shared between all cluster nodes.

        The folder location can be configured by using this property exo.files.storage.dir=/exo-shared-folder-example/files/. It is possible to modify it through exo.properties file.

      • JCR Binary Value Storage: Stored by default under a file system folder and could be configured to store files in JCR datasource instead. More details here.

        If the file system storage implementation is configured, the folder must be shared between all cluster nodes.

        The folder location can be configured by using this property exo.jcr.storage.data.dir=/exo-shared-folder-example/jcrvalues/. It is possible to modify it through exo.properties file.

      Tip

      Choosing file system or RDBMS storage depens on your needs and your system environment.(See more details in Comparing file system and RDBMS storage.

      • JCR indexes: Stored under a local file system folder in each cluster node. More details here.

        eXo Platform uses by default local JCR indexes and this is the recommended mode for clustering. In fact read and write operations take less time in local mode than in shared mode.

    • Other systems: Such as MongoDB if eXo Chat addon is installed.

  1. Configure exo.cluster.node.name property. Use a different name for each node:

    • In JBoss, edit this property in the standalone-exo-cluster.xml file:

      <system-properties>
          <property name="exo.cluster.node.name" value="node1"/>
      </system-properties>
      
    • In Tomcat, add the property in setenv-customize.sh (.bat for windows environments):

      • For windows:

        SET "CATALINA_OPTS=%CATALINA_OPTS% -Dexo.cluster.node.name=node1"
        
      • For Linux:

        CATALINA_OPTS="${CATALINA_OPTS} -Dexo.cluster.node.name=node1"
        
  2. eXo Platform uses UDP protocol by default for JGroups. This protocol is not recommended for production environements, you need to configure TCP as transport protocol instead. For that purpose, please follow this documentation.

  3. Configure CometD Oort URL. Replace localhost in the following examples with the IP or host name of the node.

    • In JBoss, edit standalone-exo-cluster.xml:

      <property name="exo.cometd.oort.url" value="http://localhost:8080/cometd/cometd"/>
      
    • In Tomcat, edit exo.properties:

      exo.cometd.oort.url=http://localhost:8080/cometd/cometd
      

    CometD is used to perform messaging over the web, and Oort is a CometD extension that supports clustering. The configuration is necessary to make the On-site Notification work properly.

  4. Configure CometD group port. This step is optional.

    CometD Oort nodes will automatically join others in the same network and the same group, so to prevent stranger nodes from joining your group, you might specify your group with a port that is different from the default port (5577). The situation is likely to happen in a testing environment.

    • In JBoss, edit standalone-exo-cluster.xml file:

      <!-- Configure the same port for all nodes in your cluster -->
      <property name="exo.cometd.oort.multicast.groupPort" value="5579"/>
      
    • In Tomcat, edit exo.properties file:

      # Configure the same port for all nodes in your cluster
      exo.cometd.oort.multicast.groupPort=5579
      
  5. The above last step is applicable when multicast is available on the system where CometD is deployed. Otherwise, the static discovery mechanism should be used by adding the following properties in exo.properties file:

    exo.cometd.oort.configType=static
    exo.cometd.oort.cloud=http://host2:port2/cometd/cometd,http://host3:port3/cometd/cometd
    
    • The default value for exo.cometd.oort.configType is “multicast”, and only the two values “multicast” and “static” are available.
    • The parameter exo.cometd.oort.cloud must contain a comma-separated list of the Cometd endpoint of all the other nodes of the cluster. So in the example above, we assume that the node of this exo.properties is host1:port1, and that the cluster is composed of three nodes : host1, host2 and host3.
  6. Only in Tomcat, configure the following:

    • In setenv-customize.sh (.bat for Windows):

      EXO_PROFILES="all,cluster"
      
    • In exo.properties:

      gatein.jcr.config.type=cluster
      gatein.jcr.index.changefilterclass=org.exoplatform.services.jcr.impl.core.query.ispn.LocalIndexChangesFilter
      # Default JCR indexing is local so you need to use a different folder for each node.
      # With the value below, you do not have to create the folder.
      exo.jcr.index.data.dir=gatein/data/jcr/index
      
  7. Start the servers. You must wait until node1 is fully started, then start node2.

    In JBoss, you need to indicate the configuration file with -c option: ./bin/standalone.sh -b 0.0.0.0 -c standalone-exo-cluster.xml (.bat for Windows).

    Only in JBoss, some other options that you can use in the start command:

    • -Dexo.cluster.node.name=a-node-name overrides the node name in the configuration file.

    • -Djboss.socket.binding.port-offset=101

      This is useful in case you set up nodes in the same machine for testing. You will not need to configure the port for every node. Just use a different port-offset in each start command.

Note

If you run two nodes in the same machine for testing, change the default ports of node2 to avoid port conflict.

In Tomcat, ports are configured in conf/server.xml.

In JBoss, use -Djboss.socket.binding.port-offset option mentioned above.

To configure a front-end for your nodes, follow Setting up Apache front-end.

To configure load balancing, follow Setting up a load balancer.

Note

eXo Platform only supports sticky session mode for clustering (no session replication). This must be configured in the load balancer configuration.

JCR index in cluster mode

Note

eXo Platform uses local JCR index by default. You can switch between local index and shared index by configuration.

The local indexing is defaulted for simplifying configuration. Each strategy has its pros and cons. Here is brief of their characteristics, but it is strongly recommended you read the given links for better understanding:

  • Local indexing: Each node manages its own local index storage. The “documents” (to be indexed) are replicated within nodes.

    “Documents” are Lucene term that means a block of data ready for indexing. The same “documents” are replicated between nodes and each node locally indexes it, so the local indexes are updated for the running nodes.

    There are additional mechanisms for a new node that starts for the first time to initiate its local index, and for a node joining the cluster after downtime to update its local index.

    Read this link for details.

  • Shared indexing: Every node has read access to a shared index and has its own in-memory index. A single “coordinator” node is responsible for pulling in-memory indexes and updating the shared index.

    It allows searching for newly added content immediately. However, there are rare cases that search result is different between nodes for a while.

    Read this link for details.

For LOCAL INDEXING, the index directory should be a local path for each node. In JBoss it is set already by default:

<property name="exo.jcr.index.data.dir" value="${exo.jcr.data.dir}/index"/>

But for Tomcat, you need to set it yourself, in exo.properties file:

exo.jcr.index.data.dir=gatein/data/jcr/index

If you want to use a SHARED INDEX for every node:

Enable the profile cluster-index-shared.

  • In JBoss, edit $PLATFORM_JBOSS_HOME/standalone/configuration/standalone-exo-cluster.xml:

    <property name="exo.profiles" value="all,cluster,cluster-index-shared"/>
    
  • In Tomcat, edit setenv-customize.sh (.bat for Windows, see Customizing environment variables):

    EXO_PROFILES="all,cluster,cluster-index-shared"
    

Set the index directory (exo.jcr.index.data.dir) to a network sharing path.

  • In JBoss, edit $PLATFORM_JBOSS_HOME/standalone/configuration/standalone-exo-cluster.xml:

    <property name="exo.jcr.index.data.dir" value="${exo.shared.dir}/jcr/index"/>
    
  • In Tomcat, if you do not configure it, exo.jcr.index.data.dir is already set to a sub-folder of the shared directory EXO_DATA_DIR. It is done in setenv.*:

    CATALINA_OPTS="$CATALINA_OPTS -Dexo.jcr.index.data.dir=\"${EXO_DATA_DIR}/jcr/index\""
    

    You can override it in exo.properties:

    exo.jcr.index.data.dir=/path/of/a/shared/folder/for/all/nodes
    

Activating TCP default configuration files

The default protocol for JGroups is UDP. However, TCP is still pre-configured in platform-extension-config.jar!/conf/platform/jgroups and you can simply activate it.

The files contain externalized variable names and default values for TCP. In case you want to use TCP instead of UDP, it is recommended that you activate those files and, if you need to, change the default settings via exo.properties. See Configuration overview for the exo.properties file.

To activate TCP default configuration files, enable the profile cluster-jgroups-tcp:

  • In JBoss, edit standalone-exo-cluster.xml:

    <system-properties>
        ...
        <property name="exo.profiles" value="all,cluster,cluster-jgroups-tcp"/>
        ...
    </system-properties>
    
  • In Tomcat, edit setenv-customize.sh (.bat for Windows, see Customizing environment variables):

    EXO_PROFILES="all,cluster,cluster-jgroups-tcp"
    

When switching to use TCP instead of UDP, you need to add some properties in exo.properties:

# Assume node1 is 192.168.1.100 and node2 is 192.168.1.101. Here is configuration for node1:

exo.jcr.cluster.jgroups.tcp.bind_addr=192.168.1.100
exo.jcr.cluster.jgroups.tcpping.initial_hosts=192.168.1.100[7800],192.168.1.101[7800]

exo.idm.cluster.jgroups.tcp.bind_addr=192.168.1.100
exo.idm.cluster.jgroups.tcpping.initial_hosts=192.168.1.100[7900],192.168.1.101[7900]

Configuring JGroups via exo.properties

JGroups configuration is externalized for both JCR and IDM. In this section you find a list of default values and externalized variables that you can configure via exo.properties. See Configuration overview for the exo.properties file.

It is recommended you configure JGroups via exo.properties. Only when the variables are not enough, or when migrating from previous versions you want to re-use your JGroups xml files, you will customize JGroups xml files as described in next section.

UDP configuration for JCR

JGroups name Default value eXo variable
UDP    
singleton_name exo-transpor t-udp exo.jcr.cluster.jgroups.udp.singleton _name
bind_addr 127.0.0.1 exo.jcr.cluster.jgroups.udp.bind_add r
bind_port 16600 exo.jcr.cluster.jgroups.udp.bind_por t
mcast_addr 228.10.10.10 exo.jcr.cluster.jgroups.udp.mcast_ad dr
mcast_port 17600 exo.jcr.cluster.jgroups.udp.mcast_po rt
tos 8 exo.jcr.cluster.jgroups.udp.tos
ucast_recv_buf_siz e 20000000 exo.jcr.cluster.jgroups.udp.ucast_re cv_buf_size
ucast_send_buf_siz e 640000 exo.jcr.cluster.jgroups.udp.ucast_se nd_buf_size
mcast_recv_buf_siz e 25000000 exo.jcr.cluster.jgroups.udp.mcast_re cv_buf_size
mcast_send_buf_siz e 640000 exo.jcr.cluster.jgroups.udp.mcast_se nd_buf_size
loopback false exo.jcr.cluster.jgroups.udp.loopback
discard_incompatible _packets true exo.jcr.cluster.jgroups.udp.discard_ incompatible_packets
max_bundle_size 64000 exo.jcr.cluster.jgroups.udp.max_bund le_size
max_bundle_timeout 30 exo.jcr.cluster.jgroups.udp.max_bund le_timeout
use_incoming_packet _handler true exo.jcr.cluster.jgroups.udp.use_inco ming_packet_handler
ip_ttl 2 exo.jcr.cluster.jgroups.udp.ip_ttl
enable_bundling false exo.jcr.cluster.jgroups.udp.enable_b undling
enable_diagnostics true exo.jcr.cluster.jgroups.udp.enable_d iagnostics
diagnostics_addr 224.0.75.75 exo.jcr.cluster.jgroups.udp.diagnosti cs_addr
diagnostics_port 7500 exo.jcr.cluster.jgroups.udp.diagnosti cs_port
thread_naming_patte rn cl exo.jcr.cluster.jgroups.udp.thread_n aming_pattern
use_concurrent_stac k true exo.jcr.cluster.jgroups.udp.use_conc urrent_stack
thread_pool.enabled true exo.jcr.cluster.jgroups.udp.thread_p ool.enabled
thread_pool.min_thr eads 10 exo.jcr.cluster.jgroups.udp.thread_p ool.min_threads
thread_pool.max_thr eads 1000 exo.jcr.cluster.jgroups.udp.thread_p ool.max_threads
thread_pool.keep_al ive_time 5000 exo.jcr.cluster.jgroups.udp.thread_p ool.keep_alive_time
thread_pool.queue_e nabled true exo.jcr.cluster.jgroups.udp.thread_p ool.queue_enabled
thread_pool.queue_m ax_size 1000 exo.jcr.cluster.jgroups.udp.thread_p ool.queue_max_size
thread_pool.rejectio n_policy discard exo.jcr.cluster.jgroups.udp.thread_p ool.rejection_policy
oob_thread_pool.ena bled true exo.jcr.cluster.jgroups.udp.oob_thre ad_pool.enabled
oob_thread_pool.min _threads 5 exo.jcr.cluster.jgroups.udp.oob_thre ad_pool.min_threads
oob_thread_pool.max _threads 1000 exo.jcr.cluster.jgroups.udp.oob_thre ad_pool.max_threads
oob_thread_pool.kee p_alive_time 5000 exo.jcr.cluster.jgroups.udp.oob_thre ad_pool.keep_alive_time
oob_thread_pool.que ue_enabled false exo.jcr.cluster.jgroups.udp.oob_thre ad_pool.queue_enabled
oob_thread_pool.que ue_max_size 1000 exo.jcr.cluster.jgroups.udp.oob_thre ad_pool.queue_max_size
oob_thread_pool.rej ection_policy Run exo.jcr.cluster.jgroups.udp.oob_thre ad_pool.rejection_policy
PING    
timeout 2000 exo.jcr.cluster.jgroups.ping.timeout
num_initial_members 1 exo.jcr.cluster.jgroups.ping.num_ini tial_members
MERGE2    
max_interval 30000 exo.jcr.cluster.jgroups.merge2.max_i nterval
min_interval 10000 exo.jcr.cluster.jgroups.merge2.min_i nterval
FD    
timeout 10000 exo.jcr.cluster.jgroups.fd.timeout
max_tries 5 exo.jcr.cluster.jgroups.fd.max_tries
shun true exo.jcr.cluster.jgroups.fd.shun
VERIFY_SUSPECT    
timeout 1500 exo.jcr.cluster.jgroups.verify_suspe ct.timeout
pbcast.NAKACK    
use_stats_for_retr ansmission false exo.jcr.cluster.jgroups.pbcast.nakack .use_stats_for_retransmission
exponential_backoff 150 exo.jcr.cluster.jgroups.pbcast.nakack .exponential_backoff
use_mcast_xmit true exo.jcr.cluster.jgroups.pbcast.nakack .use_mcast_xmit
gc_lag 0 exo.jcr.cluster.jgroups.pbcast.nakack .gc_lag
retransmit_timeout 50,300,600,1 200 exo.jcr.cluster.jgroups.pbcast.nakack .retransmit_timeout
discard_delivered_m sgs true exo.jcr.cluster.jgroups.pbcast.nakack .discard_delivered_msgs
UNICAST    
timeout 300,600,1200 exo.jcr.cluster.jgroups.unicast.timeo ut
pbcast.STABLE    
stability_delay 1000 exo.jcr.cluster.jgroups.pbcast.stable .stability_delay
desired_avg_gossip 50000 exo.jcr.cluster.jgroups.pbcast.stable .desired_avg_gossip
max_bytes 1000000 exo.jcr.cluster.jgroups.pbcast.stable .max_bytes
VIEW_SYNC    
avg_send_interval 60000 exo.jcr.cluster.jgroups.view_sync.av g_send_interval
pbcast.GMS    
print_local_addr true exo.jcr.cluster.jgroups.pbcast.gms.pr int_local_addr
join_timeout 3000 exo.jcr.cluster.jgroups.pbcast.gms.jo in_timeout
shun false exo.jcr.cluster.jgroups.pbcast.gms.sh un
view_bundling true exo.jcr.cluster.jgroups.pbcast.gms.vi ew_bundling
FC    
max_credits 500000 exo.jcr.cluster.jgroups.fc.max_credi ts
min_threshold 0.20 exo.jcr.cluster.jgroups.fc.min_thres hold
FRAG2    
frag_size 60000 exo.jcr.cluster.jgroups.frag2.frag_s ize

TCP configuration for JCR

See how to activate TCP default configuration in Activating TCP default configuration files.

JGroups name Default value eXo variable
TCP    
singleton_name exo-transpor t-tcp exo.jcr.cluster.jgroups.tcp.singleton _name
bind_addr 127.0.0.1 exo.jcr.cluster.jgroups.tcp.bind_add r
start_port 7800 exo.jcr.cluster.jgroups.tcp.start_po rt
loopback true exo.jcr.cluster.jgroups.tcp.loopback
recv_buf_size 20000000 exo.jcr.cluster.jgroups.tcp.recv_buf _size
send_buf_size 640000 exo.jcr.cluster.jgroups.tcp.send_buf _size
discard_incompatible _packets true exo.jcr.cluster.jgroups.tcp.discard_ incompatible_packets
max_bundle_size 64000 exo.jcr.cluster.jgroups.tcp.max_bund le_size
max_bundle_timeout 30 exo.jcr.cluster.jgroups.tcp.max_bund le_timeout
use_incoming_packet _handler true exo.jcr.cluster.jgroups.tcp.use_inco ming_packet_handler
enable_bundling true exo.jcr.cluster.jgroups.tcp.enable_b undling
use_send_queues true exo.jcr.cluster.jgroups.tcp.use_send _queues
sock_conn_timeout 300 exo.jcr.cluster.jgroups.tcp.sock_con n_timeout
skip_suspected_memb ers true exo.jcr.cluster.jgroups.tcp.skip_sus pected_members
use_concurrent_stac k true exo.jcr.cluster.jgroups.tcp.use_conc urrent_stack
thread_pool.enabled true exo.jcr.cluster.jgroups.tcp.thread_p ool.enabled
thread_pool.min_thr eads 10 exo.jcr.cluster.jgroups.tcp.thread_p ool.min_threads
thread_pool.max_thr eads 100 exo.jcr.cluster.jgroups.tcp.thread_p ool.max_threads
thread_pool.keep_al ive_time 60000 exo.jcr.cluster.jgroups.tcp.thread_p ool.keep_alive_time
thread_pool.queue_e nabled true exo.jcr.cluster.jgroups.tcp.thread_p ool.queue_enabled
thread_pool.queue_m ax_size 1000 exo.jcr.cluster.jgroups.tcp.thread_p ool.queue_max_size
thread_pool.rejectio n_policy Discard exo.jcr.cluster.jgroups.tcp.thread_p ool.rejection_policy
oob_thread_pool.ena bled true exo.jcr.cluster.jgroups.tcp.oob_thre ad_pool.enabled
oob_thread_pool.min _threads 10 exo.jcr.cluster.jgroups.tcp.oob_thre ad_pool.min_threads
oob_thread_pool.max _threads 100 exo.jcr.cluster.jgroups.tcp.oob_thre ad_pool.max_threads
oob_thread_pool.kee p_alive_time 60000 exo.jcr.cluster.jgroups.tcp.oob_thre ad_pool.keep_alive_time
oob_thread_pool.que ue_enabled false exo.jcr.cluster.jgroups.tcp.oob_thre ad_pool.queue_enabled
oob_thread_pool.que ue_max_size 1000 exo.jcr.cluster.jgroups.tcp.oob_thre ad_pool.queue_max_size
oob_thread_pool.rej ection_policy Discard exo.jcr.cluster.jgroups.tcp.oob_thre ad_pool.rejection_policy
TCPPING    
timeout 3000 exo.jcr.cluster.jgroups.tcpping.timeo ut
initial_hosts localhost[78 00] exo.jcr.cluster.jgroups.tcpping.initi al_hosts
port_range 0 exo.jcr.cluster.jgroups.tcpping.port_range
num_initial_members 1 exo.jcr.cluster.jgroups.tcpping.num_ initial_members
MERGE2    
max_interval 100000 exo.jcr.cluster.jgroups.merge2.max_i nterval
min_interval 20000 exo.jcr.cluster.jgroups.merge2.min_i nterval
FD    
timeout 10000 exo.jcr.cluster.jgroups.fd.timeout
max_tries 5 exo.jcr.cluster.jgroups.fd.max_tries
shun true exo.jcr.cluster.jgroups.fd.shun
VERIFY_SUSPECT    
timeout 1500 exo.jcr.cluster.jgroups.verify_suspe ct.timeout
pbcast.NAKACK    
use_mcast_xmit false exo.jcr.cluster.jgroups.pbcast.nakack .use_mcast_xmit
gc_lag 0 exo.jcr.cluster.jgroups.pbcast.nakack .gc_lag
retransmit_timeout 300,600,1200 ,2400,4800 exo.jcr.cluster.jgroups.pbcast.nakack .retransmit_timeout
discard_delivered_m sgs true exo.jcr.cluster.jgroups.pbcast.nakack .discard_delivered_msgs
UNICAST    
timeout 300,600,1200 exo.jcr.cluster.jgroups.unicast.timeo ut
pbcast.STABLE    
stability_delay 1000 exo.jcr.cluster.jgroups.pbcast.stable .stability_delay
desired_avg_gossip 50000 exo.jcr.cluster.jgroups.pbcast.stable .desired_avg_gossip
max_bytes 1m exo.jcr.cluster.jgroups.pbcast.stable .max_bytes
VIEW_SYNC    
avg_send_interval 60000 exo.jcr.cluster.jgroups.view_sync.av g_send_interval
pbcast.GMS    
print_local_addr true exo.jcr.cluster.jgroups.pbcast.gms.pr int_local_addr
join_timeout 3000 exo.jcr.cluster.jgroups.pbcast.gms.jo in_timeout
shun true exo.jcr.cluster.jgroups.pbcast.gms.sh un
view_bundling true exo.jcr.cluster.jgroups.pbcast.gms.vi ew_bundling
FC    
max_credits 2000000 exo.jcr.cluster.jgroups.fc.max_credi ts
min_threshold 0.10 exo.jcr.cluster.jgroups.fc.min_thres hold
FRAG2    
frag_size 60000 exo.jcr.cluster.jgroups.frag2.frag_s ize

UDP configuration for IDM

JGroups name Default value eXo variable
UDP    
singleton_name idm-transpor t-udp exo.idm.cluster.jgroups.udp.singleton _name
bind_addr 127.0.0.1 exo.idm.cluster.jgroups.udp.bind_add r
bind_port 26600 exo.idm.cluster.jgroups.udp.bind_por t
mcast_addr 228.10.10.10 exo.idm.cluster.jgroups.udp.mcast_ad dr
mcast_port 27600 exo.idm.cluster.jgroups.udp.mcast_po rt
tos 8 exo.idm.cluster.jgroups.udp.tos
ucast_recv_buf_siz e 20m exo.idm.cluster.jgroups.udp.ucast_re cv_buf_size
ucast_send_buf_siz e 640k exo.idm.cluster.jgroups.udp.ucast_se nd_buf_size
mcast_recv_buf_siz e 25m exo.idm.cluster.jgroups.udp.mcast_re cv_buf_size
mcast_send_buf_siz e 640k exo.idm.cluster.jgroups.udp.mcast_se nd_buf_size
loopback true exo.idm.cluster.jgroups.udp.loopback
discard_incompatible _packets true exo.idm.cluster.jgroups.udp.discard_ incompatible_packets
max_bundle_size 64000 exo.idm.cluster.jgroups.udp.max_bund le_size
max_bundle_timeout 30 exo.idm.cluster.jgroups.udp.max_bund le_timeout
ip_ttl 2 exo.idm.cluster.jgroups.udp.ip_ttl
enable_bundling true exo.idm.cluster.jgroups.udp.enable_b undling
enable_diagnostics true exo.idm.cluster.jgroups.udp.enable_d iagnostics
diagnostics_addr 224.0.75.75 exo.idm.cluster.jgroups.udp.diagnosti cs_addr
diagnostics_port 7500 exo.idm.cluster.jgroups.udp.diagnosti cs_port
thread_naming_patte rn pl exo.idm.cluster.jgroups.udp.thread_n aming_pattern
thread_pool.enabled true exo.idm.cluster.jgroups.udp.thread_p ool.enabled
thread_pool.min_thr eads 20 exo.idm.cluster.jgroups.udp.thread_p ool.min_threads
thread_pool.max_thr eads 300 exo.idm.cluster.jgroups.udp.thread_p ool.max_threads
thread_pool.keep_al ive_time 5000 exo.idm.cluster.jgroups.udp.thread_p ool.keep_alive_time
thread_pool.queue_e nabled true exo.idm.cluster.jgroups.udp.thread_p ool.queue_enabled
thread_pool.queue_m ax_size 1000 exo.idm.cluster.jgroups.udp.thread_p ool.queue_max_size
thread_pool.rejectio n_policy Discard exo.idm.cluster.jgroups.udp.thread_p ool.rejection_policy
oob_thread_pool.ena bled true exo.idm.cluster.jgroups.udp.oob_thre ad_pool.enabled
oob_thread_pool.min _threads 20 exo.idm.cluster.jgroups.udp.oob_thre ad_pool.min_threads
oob_thread_pool.max _threads 300 exo.idm.cluster.jgroups.udp.oob_thre ad_pool.max_threads
oob_thread_pool.kee p_alive_time 1000 exo.idm.cluster.jgroups.udp.oob_thre ad_pool.keep_alive_time
oob_thread_pool.que ue_enabled false exo.idm.cluster.jgroups.udp.oob_thre ad_pool.queue_enabled
oob_thread_pool.que ue_max_size 100 exo.idm.cluster.jgroups.udp.oob_thre ad_pool.queue_max_size
oob_thread_pool.rej ection_policy Discard exo.idm.cluster.jgroups.udp.oob_thre ad_pool.rejection_policy
PING    
timeout 2000 exo.idm.cluster.jgroups.ping.timeout
num_initial_members 1 exo.idm.cluster.jgroups.ping.num_ini tial_members
MERGE2    
max_interval 100000 exo.idm.cluster.jgroups.merge2.max_i nterval
min_interval 20000 exo.idm.cluster.jgroups.merge2.min_i nterval
FD    
timeout 6000 exo.idm.cluster.jgroups.fd.timeout
max_tries 5 exo.idm.cluster.jgroups.fd.max_tries
VERIFY_SUSPECT    
timeout 1500 exo.idm.cluster.jgroups.verify_suspe ct.timeout
pbcast.NAKACK    
use_mcast_xmit true exo.idm.cluster.jgroups.pbcast.nakack .use_mcast_xmit
retransmit_timeout 300,600,1200 ,2400,4800 exo.idm.cluster.jgroups.pbcast.nakack .retransmit_timeout
discard_delivered_m sgs true exo.idm.cluster.jgroups.pbcast.nakack .discard_delivered_msgs
UNICAST2    
timeout 300,600,1200 ,2400,3600 exo.idm.cluster.jgroups.unicast2.time out
stable_interval 5000 exo.idm.cluster.jgroups.unicast2.stab le_interval
max_bytes 1m exo.idm.cluster.jgroups.unicast2.max_bytes
pbcast.STABLE    
stability_delay 1000 exo.idm.cluster.jgroups.pbcast.stable .stability_delay
desired_avg_gossip 50000 exo.idm.cluster.jgroups.pbcast.stable .desired_avg_gossip
max_bytes 400000 exo.idm.cluster.jgroups.pbcast.stable .max_bytes
pbcast.GMS    
print_local_addr true exo.idm.cluster.jgroups.pbcast.gms.pr int_local_addr
join_timeout 3000 exo.idm.cluster.jgroups.pbcast.gms.jo in_timeout
view_bundling true exo.idm.cluster.jgroups.pbcast.gms.vi ew_bundling
view_ack_collection _timeout 5000 exo.idm.cluster.jgroups.pbcast.gms.vi ew_ack_collection_timeout
resume_task_timeout 7500 exo.idm.cluster.jgroups.pbcast.gms.re sume_task_timeout
UFC    
max_credits 2000000 exo.idm.cluster.jgroups.ufc.max_cred its
ignore_synchronous_ response true exo.idm.cluster.jgroups.ufc.ignore_s ynchronous_response
MFC    
max_credits 2000000 exo.idm.cluster.jgroups.mfc.max_cred its
ignore_synchronous_ response true exo.idm.cluster.jgroups.mfc.ignore_s ynchronous_response
FRAG2    
frag_size 60000 exo.idm.cluster.jgroups.frag2.frag_s ize
RSVP    
timeout 60000 exo.idm.cluster.jgroups.rsvp.timeout
resend_interval 500 exo.idm.cluster.jgroups.rsvp.resend_ interval
ack_on_delivery false exo.idm.cluster.jgroups.rsvp.ack_on_delivery
timeout 60000 exo.jcr.cluster.jgroups.rsvp.timeout
resend_interval 500 exo.jcr.cluster.jgroups.rsvp.resend_ interval
ack_on_delivery false exo.jcr.cluster.jgroups.rsvp.ack_on_delivery

TCP configuration for IDM

See how to activate TCP default configuration in Activating TCP default configuration files.

JGroups name Default value eXo variable
TCP    
singleton_name idm-transpor t-tcp exo.idm.cluster.jgroups.tcp.singleton _name
bind_addr 127.0.0.1 exo.idm.cluster.jgroups.tcp.bind_add r
bind_port 7900 exo.idm.cluster.jgroups.tcp.bind_por t
port_range 30 exo.idm.cluster.jgroups.tcp.port_ran ge
loopback true exo.idm.cluster.jgroups.tcp.loopback
recv_buf_size 20m exo.idm.cluster.jgroups.tcp.recv_buf _size
send_buf_size 640k exo.idm.cluster.jgroups.tcp.send_buf _size
discard_incompatible _packets true exo.idm.cluster.jgroups.tcp.discard_ incompatible_packets
max_bundle_size 64000 exo.idm.cluster.jgroups.tcp.max_bund le_size
max_bundle_timeout 30 exo.idm.cluster.jgroups.tcp.max_bund le_timeout
enable_bundling true exo.idm.cluster.jgroups.tcp.enable_b undling
use_send_queues true exo.idm.cluster.jgroups.tcp.use_send _queues
enable_diagnostics false exo.idm.cluster.jgroups.tcp.enable_d iagnostics
bundler_type old exo.idm.cluster.jgroups.tcp.bundler_ type
thread_naming_patte rn pl exo.idm.cluster.jgroups.tcp.thread_n aming_pattern
thread_pool.enabled true exo.idm.cluster.jgroups.tcp.thread_p ool.enabled
thread_pool.min_thr eads 5 exo.idm.cluster.jgroups.tcp.thread_p ool.min_threads
thread_pool.max_thr eads 100 exo.idm.cluster.jgroups.tcp.thread_p ool.max_threads
thread_pool.keep_al ive_time 60000 exo.idm.cluster.jgroups.tcp.thread_p ool.keep_alive_time
thread_pool.queue_e nabled true exo.idm.cluster.jgroups.tcp.thread_p ool.queue_enabled
thread_pool.queue_m ax_size 100 exo.idm.cluster.jgroups.tcp.thread_p ool.queue_max_size
thread_pool.rejectio n_policy Discard exo.idm.cluster.jgroups.tcp.thread_p ool.rejection_policy
oob_thread_pool.ena bled true exo.idm.cluster.jgroups.tcp.oob_thre ad_pool.enabled
oob_thread_pool.min _threads 5 exo.idm.cluster.jgroups.tcp.oob_thre ad_pool.min_threads
oob_thread_pool.max _threads 100 exo.idm.cluster.jgroups.tcp.oob_thre ad_pool.max_threads
oob_thread_pool.kee p_alive_time 60000 exo.idm.cluster.jgroups.tcp.oob_thre ad_pool.keep_alive_time
oob_thread_pool.que ue_enabled false exo.idm.cluster.jgroups.tcp.oob_thre ad_pool.queue_enabled
oob_thread_pool.que ue_max_size 100 exo.idm.cluster.jgroups.tcp.oob_thre ad_pool.queue_max_size
oob_thread_pool.rej ection_policy Discard exo.idm.cluster.jgroups.tcp.oob_thre ad_pool.rejection_policy
TCPPING    
timeout 3000 exo.idm.cluster.jgroups.tcpping.timeo ut
initial_hosts localhost[79 00] exo.idm.cluster.jgroups.tcpping.initi al_hosts
port_range 0 exo.idm.cluster.jgroups.tcpping.port_range
num_initial_members 1 exo.idm.cluster.jgroups.tcpping.num_ initial_members
ergonomics false exo.idm.cluster.jgroups.tcpping.ergon omics
MERGE2    
max_interval 30000 exo.idm.cluster.jgroups.merge2.max_i nterval
min_interval 10000 exo.idm.cluster.jgroups.merge2.min_i nterval
FD    
timeout 3000 exo.idm.cluster.jgroups.fd.timeout
max_tries 3 exo.idm.cluster.jgroups.fd.max_tries
VERIFY_SUSPECT    
timeout 1500 exo.idm.cluster.jgroups.verify_suspe ct.timeout
pbcast.NAKACK    
use_mcast_xmit false exo.idm.cluster.jgroups.pbcast.nakack .use_mcast_xmit
retransmit_timeout 300,600,1200 ,2400,4800 exo.idm.cluster.jgroups.pbcast.nakack .retransmit_timeout
discard_delivered_m sgs false exo.idm.cluster.jgroups.pbcast.nakack .discard_delivered_msgs
UNICAST2    
timeout 300,600,1200 exo.idm.cluster.jgroups.unicast2.time out
stable_interval 5000 exo.idm.cluster.jgroups.unicast2.stab le_interval
max_bytes 1m exo.idm.cluster.jgroups.unicast2.max_bytes
pbcast.STABLE    
stability_delay 500 exo.idm.cluster.jgroups.pbcast.stable .stability_delay
desired_avg_gossip 5000 exo.idm.cluster.jgroups.pbcast.stable .desired_avg_gossip
max_bytes 1m exo.idm.cluster.jgroups.pbcast.stable .max_bytes
pbcast.GMS    
print_local_addr true exo.idm.cluster.jgroups.pbcast.gms.pr int_local_addr
join_timeout 3000 exo.idm.cluster.jgroups.pbcast.gms.jo in_timeout
view_bundling true exo.idm.cluster.jgroups.pbcast.gms.vi ew_bundling
UFC    
max_credits 200k exo.idm.cluster.jgroups.ufc.max_cred its
min_threshold 0.20 exo.idm.cluster.jgroups.ufc.min_thre shold
MFC    
max_credits 200k exo.idm.cluster.jgroups.mfc.max_cred its
min_threshold 0.20 exo.idm.cluster.jgroups.mfc.min_thre shold
FRAG2    
frag_size 60000 exo.idm.cluster.jgroups.frag2.frag_s ize
RSVP    
timeout 60000 exo.idm.cluster.jgroups.rsvp.timeout
resend_interval 500 exo.idm.cluster.jgroups.rsvp.resend_ interval
ack_on_delivery false exo.idm.cluster.jgroups.rsvp.ack_on_delivery

Using customized JGroups xml files

JGroups configuration, for both JCR and IDM, is externalized via exo.properties (see Configuration overview for this file). It is recommended you use this file. See previous section for list of default values and externalized variables.

Only when the variables are not enough, or when migrating from previous version you want to re-use your JGroups configuration files, you will follow this section to activate your xml files.

  1. Put your xml file somewhere, typically standalone/configuration/gatein/jgroups/ in JBoss and gatein/conf/jgroups/ in Tomcat.

  2. Edit the following properties in exo.properties:

    exo.jcr.cluster.jgroups.config=${exo.conf.dir}/jgroups/jgroups-jcr.xml
    exo.jcr.cluster.jgroups.config-url=file:${exo.jcr.cluster.jgroups.config}
    exo.idm.cluster.jgroups.config=${exo.conf.dir}/jgroups/jgroups-idm.xml
    

In which exo.conf.dir is standalone/configuration/gatein in JBoss and gatein/conf in Tomcat by default.

If you put your files somewhere else, pay attention that you must use an absolute path after “file:”.

exo.jcr.cluster.jgroups.config=/path/to/your/jgroups-jcr-file
exo.jcr.cluster.jgroups.config-url=file:/path/to/your/jgroups-jcr-file
exo.idm.cluster.jgroups.config=/path/to/your/jgroups-idm-file

Setting up a load balancer

Setting up a basic load balancing with Apache

The following modules need to be activated in order to do load balancing on several cluster nodes :

  • mod_proxy_balancer
  • mod_slotmem_shm (mandatory for mod_proxy_balancer)
  • mod_lbmethod_byrequests if you choose the by request balancing algorithm (can be also mod_lbmethod_bytraffic or mod_lbmethod_bybusyness)

Part of an apache configuration to enabled load balancing :

# Add a http header to explicitly identify the node and be sticky
Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED

# Declare the http server pool
<Proxy "balancer://plf">
  BalancerMember "http://node1:8080" route=node1 acquire=2000 retry=5 keepalive=on ping=30 connectiontimeout=2
  BalancerMember "http://node2:8080" route=node2 acquire=2000 retry=5 keepalive=on ping=30 connectiontimeout=2
  ProxySet stickysession=ROUTEID
</Proxy>

# Declare the pool dedicated to the websocket tunnels
<Proxy "balancer://plf_ws">
  BalancerMember "ws://node1:8080" route=node1 acquire=2000 retry=0 keepalive=on ping=30 connectiontimeout=2 disablereuse=on flushpackets=on
  BalancerMember "ws://node2:8080" route=node2 acquire=2000 retry=0 keepalive=on ping=30 connectiontimeout=2 disablereuse=on flushpackets=on
  ProxySet stickysession=ROUTEID
</Proxy>

# Common options
ProxyRequests           Off
ProxyPreserveHost       On

# Declare the redirection for websocket urls, must be declared before the general ProxyPass definition
ProxyPass /cometd "balancer://plf_ws/cometd"

# Declare the redirection for the http requests
ProxyPass               /       "balancer://plf/"
ProxyPassReverse        /       "balancer://plf/"

Note

This configuration must be adapted to you specific needs before you go to production.

All the configuration detail can be found on the Apache configuration page

Improving the logs

Diagnose a cluster problem can be difficult. The Apache logs can be customized to help you to follow the load balancing behavior.

The BALANCER_WORKER_ROUTE will add in your logs the name of the node that received the requests.

The BALANCER_ROUTE_CHANGED will set the field to 1 if the user was redirected to different node compared his previous request. This indicate the node was removed from the cluster pool or was not able to received more requests. During normal processing, this flag should always have the value -.

Example of log format with cluster diagnosis enabled :

LogFormat "%h %l %u %t \"%r\" %>s %b %{BALANCER_WORKER_ROUTE}e %{BALANCER_ROUTE_CHANGED}e" common_cluster

Note

More log options are detailed in the Apache documentation

Setting up basic load balancing with NGINX

Note

The load balancing support on the free version of NGINX is limited. The sticky algorithm is limited to ip hash and the nodes configuration can’t be precisly tuned.

If you have a NGINX plus license, the full load balancing documentation can be found here

Basic NGINX load balancing configuration :

  upstream plf {
    ip_hash;
    server node1:8080;
    server node2:8080;
  }
server {

  listen 80;
    location / {
      proxy_pass http://plf;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;    }
    # Websocket for Cometd
    location /cometd/cometd {
      proxy_pass http://plf;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

FAQs of clustering

Q: How to migrate from local to the cluster mode?

A: If you intend to migrate your production system from the local (non-cluster) to the cluster mode, follow these steps:

Update the configuration to the cluster mode as explained above on your main server.

Use the same configuration on other cluster nodes.

Move the index and value storage to the shared file system.

Start the cluster.

Q: Why is startup failed with the “Port value out of range” error?

A: On Linux, your startup is failed if you encounter the following error:

[INFO] Caused by: java.lang.IllegalArgumentException: Port value out of range: 65536

This problem happens under specific circumstances when the JGroups networking library behind the clustering attempts to detect the IP to communicate with other nodes.

You need to verify:

  • The host name is a valid IP address, served by one of the network devices, such as eth0, and eth1.
  • The host name is NOT defined as localhost or 127.0.0.1.

Q: How to solve the “failed sending message to null” error?

A: If you encounter the following error when starting up in the cluster mode on Linux:

Dec 15, 2010 6:11:31 PM org.jgroups.protocols.TP down
        SEVERE: failed sending message to null (44 bytes)
        java.lang.Exception: dest=/228.10.10.10:45588 (47 bytes)

Be aware that clustering on Linux only works with IPv4. Therefore, when using a cluster under Linux, add the following property to the JVM parameters:

-Djava.net.preferIPv4Stack=true