Over the past several years, as cloud computing has matured and become more widely used, the number and variety of applications has exploded. As a an ECM Architect, there are a couple of scenarios I commonly encounter where working in the cloud can be particularly useful:
- As part of the deployment of an enterprise system, it is often necessary to build a test environment where the planned architecture can be proven prior to provisioning the actual hardware for QA and production environments. The architecture is too complex to validate using a single workstation, but acquiring and setting up physical servers, or even provisioning virtual machines (e.g. VMWare) within an organization’s infrastructure for this purpose is sometimes inefficient or infeasible.
- In other situations, small organizations simply lack the IT resources to allow them to host a production system internally.
In both of these situations, cloud computing can provide an effective and inexpensive way to allocate hardware resources for the task. With that in mind, I would like to show how, for a recent project at Zia, we deployed an Alfresco high-availability cluster in an Amazon EC2 environment. In our case, we are using EC2 for #1 above – as a proving ground for our Alfresco configuration.
It is worth emphasizing that the configuration outlined below is not intended to be a production environment. For a production environment, we would likely use a large instance for the DB server, and possibly have separate servers for the database and NFS. We would want to build in redundancy for the DB and NFS, and we would do load testing and benchmarking to ensure that our environment performs as desired.
When setting up an Alfresco cluster, following are the key considerations that differentiate it from a stand-alone installation:
- Shared Database – There’s nothing really special about the database, except that it’s shared, so it needs to be accessible from all app server cluster nodes. In our case, we are using MySQL.
- Shared Content Store – We need some kind of shared disk for a content store that can be accessed by all cluster nodes. A common solution is to establish a shared volume on a SAN using a clustered file system like OCFS2. In our case, since we’re operating in the cloud, we’re using NFS.
- Database L2 Cache Replication – Alfresco uses EHCache as its L2 database cache. Each app server node has its own cache, and when used in a cluster, these local caches need to be replicated across cluster nodes. EHCache has its own built-in cache mechanism, which uses UDP multicast to communicate changes amongst peers in the cluster. As of Alfresco 3.1, the Enterprise edition also allows the use of JGroups to provide the communication conduit for EHCache. JGroups supports a number of different protocols in addition to UDP multicast, and therefore provides greater flexibility to adapt to specific network environments. In our EC2 environment, UDP multicast is not an option, so we are using TCP for cache replication.
- Load Balancing – We need a load balancer to distribute requests to the app server nodes in the cluster. The load balancer must support sticky sessions so that all requests for a given session are routed to the same server. We’re using Amazon’s load balancer, which is an object that can be created and configured in the EC2 environment.
- Set a cluster name using the alfresco.cluster.name property in alfresco-global.properties.
- Rename ehcache-custom.xml.sample.cluster to ehcache-custom.xml to activate EHCache replication. This Spring context is located in the shared alfresco/extension directory, and contains replication config for a multitude of caches used by Alfresco. There should be no need to modify it.
- EC2 doesn’t support UDP multicast (default for Alfresco cache replication), so we switched to TCP
- The hostname of an ec2 instance resolves to an IP that is either inaccessible or doesn’t exist. JGroups tries to bind to this IP and fails. We worked around the issue by adding a hosts file entry on each app server, so that the hostname resolves to the instance’s internal IP.
- Alfresco App Server Node 1 – m1.large instance running Alfresco 3.3.2 Enterprise
Alfresco App Server Node 2 – m1.large instance running Alfresco 3.3.2 Enterprise
DB and NFS Server – m1.small instance running MySQL 5.1
Shared MySQL Database
Because MySQL is on a separate box from the app servers, it is necessary to grant remote login privileges to the alfresco user:
grant all on alfresco.* to 'alfresco'@'%' identified by 'alfresco' with grant option;
To further tighten security, permission can be granted for only certain hosts (the two app server nodes in this case) instead of granting access from all hosts (‘%’). In EC2, we are using a security group to accomplish the same thing – nothing outside the cluster can get to MySQL.
The shared disk for the content store in this configuration is provided by the database server via NFS, and is mounted as /data on all three machines. For a production environment, separating the shared disk from the database server might yield better performance. See the alfresco-global.properties listing below for the settings that define the content store location for Alfresco.
EC2 Load Balancer
Configuration of the EC2 load balancer is pretty straightforward. In the EC2 console, create a new load balancer, and add the app server nodes to the list of instances that the load balancer serves. In the port configuration, configure it to use sticky sessions based on application-generated cookies, with the cookie name being JSESSIONID.
L2 Cache Replication Using JGroups
In its default configuration, cluster cache replication using JGroups can be enabled in two steps (must be performed on all app server nodes):
This will result in JGroups using UDP multicast, connecting to 18.104.22.168 port 4446. The admin guide provides details on alfresco-global.properties settings that can be used to specify a different multicast address and port.
To use TCP, we’re using the following settings in alfresco-global.properties.
# use TCP instead of UDP alfresco.jgroups.defaultProtocol=TCP # define the hosts that are part of the cluster alfresco.tcp.initial_hosts=node1,node2
The alfresco.jgroups.defaultProtocol=TCP setting causes JGroups to use TCP instead of UDP. The alfresco.tcp.initial_hosts setting defines the hosts that are part of the cluster.
Important note: On each app server node, the hostname must resolve to a routable IP. For example, if you’re logged into a server with a hostname of “myhost” and you run “ping myhost” and it shows replies from 127.0.0.1, this will not work. Make sure the hosts file is configured so that myhost resolves to the private (10.x.x.x) IP of the host
If your environment requires more advanced tweaks to the JGroups configuration, you may need to modify alfresco-jgroups-UDP.xml or alfresco-jgroups-TCP.xml, depending on which protocol you are using.
Refer to the admin guide for instructions on using the built-in EHCache replication (UDP multicast without JGroups).
EC2-Specific Network Issues
There are a couple of things to be aware of when setting up cache replication in an EC2 environment:
Here’s a complete listing of the settings from alfresco-global.properties (same settings on both Alfresco app server nodes) that are pertinent for cluster configuration:
# cluster name alfresco.cluster.name=alf_cluster # use TCP instead of UDP alfresco.jgroups.defaultProtocol=TCP # define the hosts that are part of the cluster alfresco.tcp.initial_hosts=node1,node2 # local alf_data directory for Lucene indexes dir.root=/opt/alfresco/alf_data # central content store directory (/data is NFS share on DB server) dir.contentstore=/data/alfresco/alf_data/contentstore dir.contentstore.deleted=/data/alfresco/alf_data/contentstore.deleted dir.auditcontentstore=/data/alfresco/alf_data/audit.contentstore # database connection properties db.name=alfresco db.username=alfresco db.password=alfresco db.host=dbserver db.port=3306