Background

On a recent project I have spent some time investigating and implementing a content migration solution from a legacy content system into Alfresco. After researching numerous alternatives, I found that a combination of two popular technologies with a small extension produced a high performance migration solution. The project had a relatively straightforward requirement:

Inject content into Alfresco from a source content management system. Take into account performance impacts on the existing and target system, as well as the overall cost of the migration. The source content management system has millions of content documents to migrate. Migration may be batched based on business requirements.

This article is focused on the ingestion of content into Alfresco.

Technologies

Various technology options were available to perform the content import. I spent some time scouring the web looking for migration tools and technologies as well as considering direct access to the source and target system. It quickly became obvious (partly due to Peter Monks’ blog post on one approach) that there were a handful of options available:

Direct API calls to Alfresco. Using CMIS, ReST, SOAP, etc.
Alfresco’s JLan Server.
Alfresco Content Import using ACP Files.
Open Migrate (from Technology Services Group)
Bulk Filesystem Import (from Peter Monks)

After carefully considering the options and understanding the client’s needs, I arrived at really three options: Direct API calls, Open Migrate or Bulk Filesystem Import. Direct API calls would require custom code for a migration and seemed like a fairly high cost for the value add. We were also concerned about any kind of remote API calls. That left us with the two tools. It is worth explaining how these two tools work:

Open Migrate is a configurable extensible framework for content migrations. The basic flow is to retrieve content from the source system, map it to the target representation and deliver the content to the target system. The mapping activity is configurable and allows you to map and modify content as appropriate during a migration process. The tool has the possibility to be a complete end-to-end solution for many content migrations. The out of the box implementation for an Alfresco target calls the Alfresco remote API’s and transfers content over the wire during import. On the surface this isn’t a huge deal, but for a large volume migration this could be a performance barrier.

The Bulk Filesystem Import tool is a highly specialized tool designed to import content in a set of folders and files on a local filesystem into Alfresco. The tool runs in process (it is deployed with Alfresco) and therefore does not have any over the wire performance implications. The tool is simply pointed to a given folder hierarchy and content and folders within the filesystem are replicated in Alfresco. Properties specified in any optionally accompanying properties files can specify the content’s type, aspects and metadata property values. The format required by the tool for metadata properties is a simple properties file and while perhaps somewhat limited, would meet our needs based on an analysis of the content in the source system. Peter claims a ~20x performance improvement over a CIFS approach to migration. This got my attention and helped validate my concern about remote API calls, specifically around sending file contents over the wire.

Decision Time

Open Migrate was the first choice from a cost perspective. We hoped to have little to no effort to configure the tool and execute a simple migration. Using the Bulk Filesystem Import tool was a reasonable alternative but would require a translation of content from the source system to the format understood by the import tool. The source system’s content mapped reasonably easily to Alfresco so this didn’t seem like too large of a hurdle. Thus, I was intrigued by the idea of combining the possibility of a good end-to-end framework with the high performance of Bulk Filesystem Import. I explored how I could develop a Bulk Filesystem Import folder structure as a target in Open Migrate.

Implementation

I configured Open Migrate with a “simple migration target” which is a container that allows extension by writing listeners.

    <bean id="MigrationTarget"
        class="com.tsgrp.migration.target.SimpleMigrationTarget"
        scope="prototype">
        <property name="eventListeners">
            <list>
                <ref bean="AlfrescoBulkFileSystemImportWriter" />
            </list>
        </property>
    </bean>

Then I configured a target listener to perform the actual writing to disk. Note, targetDir is a property specified in the Open Migrate properties configuration and is the top level output directory from the Open Migrate process – where you’ll ultimately point the Bulk Filesystem Import tool.

    <bean id="AlfrescoBulkFileSystemImportWriter"
        class="com.ziaconsulting.migration.event.target.AlfrescoBulkFileSystemImportWriter"
        scope="prototype">
        <property name="targetDir" value="${targetDir}" />
    </bean>

The implementation of the listener is fairly straightforward. Each target migration node in Open Migrate has already been properly populated with the desired attributes (metadata properties). The files need to be laid into the directories and the metadata properties need to be written to a format prescribed by the Bulk Filesystem Import tool (e.g. filename.metadata.properties). It should be noted that due to issue 19 in the Bulk Filesystem Import tool, dates are handled specially. Also see ISO-8601. In this migration I’ve only handled single-valued dates as noted below, and String properties. The properties writing is accomplished as follows:

    private void createNodeProperties(MigrationNode node) {
        Properties props = new Properties();

        for (String attr : node.getAttributeNames(false)) {
            if (attr.startsWith("migration_info_node_")) {
                // Skip this attribute, it's an open-migrate migration detail,
                // not represented in Alfresco.
                continue;
            }
            NodeAttribute nodeAttr = node.getAttribute(attr);

            String value = EMPTY_STRING;

            if (nodeAttr.getDataType().getJavaTypeName().equals(Date.class.getName())) {
                // TODO Doesn't handle multi-valued date properties
                if (nodeAttr.getFirst() != null) {
                    // Only store date fields which have a value
                    Date date = (Date) nodeAttr.getFirst();
                    props.setProperty(attr, ISO8601DateFormat.format(date));
                }
            } else {
                if (nodeAttr != null && nodeAttr.getFirst() != null) {
                    value = node.getAttribute(attr).valuesToString(DELIM);
                }

                props.setProperty(attr, value);
            }
        }

        if (props.size() == 0) {
            // If a node has no properties, don't write the file.
            return;
        }

        // Helper method to get the file path based on targetDir and the node's folder.
        String targetFullFilePath = PathHelper.getContentNodeFilePath(getTargetDir(), node) + ".metadata.properties;
        logger.debug("Target will create properties file " + targetFullFilePath);

        // Get the file object
        File targetFile = new File(targetFullFilePath);

        try {
            if (targetFile.createNewFile()) {

                FileWriter writer = null;
                try {
                    writer = new FileWriter(targetFile);
                    props.store(writer, null);
                    writer.close();
                } catch (IOException ioe) {
                    MigrationException.throwException(ExceptionType.TARGET_NODE_EXCEPTION, "I/O Exception on File Folder Migration Target", ioe);
                    if (writer != null) {
                        try {
                            writer.close();
                        } catch (IOException e) {
                            // Ignore
                        }
                    }
                }
            }
        } catch (IOException e) {
            MigrationException.throwException(ExceptionType.TARGET_NODE_EXCEPTION, "I/O Exception on File Folder Migration Target", e);
        }
    }

As for laying out the binary files, the exercise is largely left to the reader. In our case, the files were accessible on disk and we performed a copy from the source system to the target location for importing. For each target migration node the code writes out the metadata properties and the associated binary content file.

Running the Bulk Filesystem Import utility is as simple as pointing to the target directory and watching your documents import (quickly!) into Alfresco.

13 Comments

Peter Monks on January 7, 2011 at 1:14 pm

Great post – I’m glad you found the bulk import tool useful!

I believe the TSG folks (the developers of OpenMigrate) have also been looking into integrating OpenMigrate with the bulk import tool. Might be worth contacting those guys and figuring out a way to combine your efforts?

From memory Todd Pierzina is the OpenMigrate architect over at TSG.
mikemahon on January 7, 2011 at 1:26 pm

Thanks Peter! I appreciate the quick feedback. I’ll have Ryan contact Todd.
Ryan McVeigh on January 8, 2011 at 2:25 pm

Thanks Peter! I have spoken with Todd previously and he’s aware we went down this road.
sibe on January 18, 2011 at 3:49 am

Hello Ryan

Have you use succeeded openMigrate and Alfresco ?
Ryan McVeigh on January 21, 2011 at 3:02 pm

I have, yes. In particular I’ve integrated using the method described above. Open Migrate also has a direct connection to the Alfresco web services which I have not done more than play with.
sibe on February 18, 2011 at 2:44 am

Re Ryan,
I use openmigrate and I want migrate migrate content from alfresco to alfresco, so I write the MigrationSourceAlfresco and the MigrationQueuePopulatorAlfresco, can you help me to describe this classes please ?
Ryan McVeigh on February 21, 2011 at 1:15 pm

Hi Sibe,

Open Migrate is able to migrate directly from Alfresco to Alfresco out of the box without going through this process. Are you specifically interested in the use of the Bulk Filesystem Import Tool in conjunction with Open Migrate?

Have you considered using Alfresco’s import/export functionality and building an Alfresco Content Package (ACP) to do this?
sibe on February 28, 2011 at 6:40 am

Hi Ryan,
I don’t know how using the Bulk Filesystem Import Tool .

but I wan’t use Alfresco Content Package.

When I running openmigrate I have the following error
Error while executing callback: java.lang.IllegalArgumentException: A uuid or a path must be supplied to resolve to a NodeRef

Can you posted a QueuePopulators class example.

Thank
Ryan McVeigh on February 28, 2011 at 11:34 pm

I’m not sure I’m following. The QueuePopulator I have written is not for use with an Alfresco source repository. If you’re doing Alfresco to Alfresco, I still think using Alfresco ACP’s are the way to go. Is that not your use case?
sibe on March 1, 2011 at 2:42 am

I agree with you that we must use ACP but we really want use OpenMigrate to migrate from Alfresco to Alfresco; Because wa have a lot of contents and with ACP it’s verry long.
Ryan McVeigh on March 1, 2011 at 10:00 am

Got it. Since I haven’t configured OM to retrieve from Alfresco, I’m not sure how you configure the Queue Populator. You may have more success asking the folks at TSG directly.
sibe on March 1, 2011 at 10:09 am

Thank Ryan,
However, can you explain me how do you do to use Bulk FileSystem import.amp to migrate contents to Alfresco ?

thank
sibe on March 9, 2011 at 8:25 am

Hello Ryan
The new release of oma is available.It’s a verry good tools
see https://tsgrp.com

Submit a Comment

You must be logged in to post a comment.

Alfresco Content Migration

Background

Technologies

Decision Time

Implementation

13 Comments

Submit a Comment

Pin It on Pinterest

Sharing is caring