Archive by Author

Zia Hosting Alfresco Mobile Webinar

Continue your Alfresco DevCon experience and Level Up again with this webinar on Alfresco Mobile.

When: November 01 2011
Where: Online Webinar 1pm ET, 10am PT, 4pm GMT

Zia is talking about the iOS Mobile Application at Alfresco DevCon this year. Our session covers a technical review of the application’s features and functionality as well as a walkthrough of the open source code repository details. Join us for this webinar to level up once again. We’ll take the presentation to the next level, covering an in-depth walkthrough of the code and specific feature implementation details we weren’t able to cover in the DevCon presentation.

Register here http://www.alfresco.com/about/events/2011/11/alfresco-mobile/

If you are at DevCon this week, stop by the Zia booth to learn more about the application and the webinar.

 

Alfresco CMIS Notes

While working on a recent project this spring, I learned a few new things about Alfresco’s CMIS implementation in Alfresco 3.4. Generally speaking the Alfresco CMIS wiki page and the mapping page (which explains their mapping of the CMIS domain model) cover Alfresco’s implementation very thoroughly. I ran into two situations which I spent a little time on and hopefully I can save you some time if you’re working with CMIS against Alfresco.

The first is represented by ALF-7538. Until Alfresco Enterprise 3.4.2 it was not possible to retrieve associations defined in the content model by an aspect. Consider a content model with an aspect which defines an association within an aspect. If that aspect is applied to your content item, you are not able to reach that aspect via the CMIS API. The alternatives are to model your association directly on the type or sub-type of the content node. As of Alfresco 3.4.2 this is resolved, but take note that the association must be between two derivations of cm:content.

The second is represented by ALF 7827. The search operation using the CMIS API does not support TEXT, ALL and d:content as is expected. There’s a thread on the Alfresco Forum between myself and Andy Hind who’s responsible for the CMIS search implementation. My use case was to simply mimic the search results found in Alfresco Share via a CMIS API call. In short, I found that a query like:

select * from cmis:document where CONTAINS('test')

would not return content as expected. Ultimately my goal was to search all indexed fields for the given content type. The thread discusses a way to do this and there’s a fix already implemented for Alfresco 3.4.3.

In both cases I was able to work around these issues. The Alfresco engineering team provided assistance in either the JIRA issue that was filed, or the Alfresco forum to help me along.

Importing Versioned Content into Alfresco

Background

I have recently blogged about integrating Open Migrate and the Alfresco Bulk Filesystem Import Tool.  As part of that exercise I also spent time with a colleague to implement support for importing versioned content into Alfresco.  Once again this was already supported in Open Migrate’s direct to Alfresco implementation, but was not already present in the Bulk FileSystem Import Tool.  Background on why I’ve chosen to combine these two tools is in my previous post.  The implementation was ultimately fairly straightforward.  Along the way I learned more about the Alfresco API, some of its nuances and how to implement this feature in a backwards compatible manner that we could contribute back to the community.

Design Decisions

The initial design goal was simply to import versioned content from disk into Alfresco. While that sounds good in theory it does present a problem. Namely, how should we represent the content on disk?  One option would have been to enable the user to provide a separate directory structure for your versioned content and configure the tool to have a separate versioning importer.  However, it was my preference to use the same directory structure for both versioned and non-versioned content.

The user is responsible for providing versioned content files if required.  The current file format is simply a directory optionally with subdirectories and each with content files and optionally a content file with the extension of metadata.properties to supply metadata.

To support versioned content, the user may optionally specify files with a new extension.  Any file ending with the pattern v[0-9]*.  This applies to both content and metadata properties files.  Versions are imported into Alfresco as follows:

Find all files in a given series, for example:

  • Head Revision
    • manual.pdf
    • manual.pdf.metadata.properties
  • First Version
    • manual.pdf.v1
    • manual.pdf.metadata.properties.v1
  • Second Version
    • manual.pdf.v2
    • manual.pdf.metadata.properties.v2

Note that the head revision of the file is not appended with a version identifier. This allows for backwards compatibility with any existing file structures used with the tool.  A file without any versions will simply be created just like the tool has worked to date.  Files with version extensions (and their associated metadata pieces) will be used to create the document and associated subsequent versions until the final head revision has been created.

The sorting is a simple alphabetical sort on the versioned extension.  This allows for gaps in version history (e.g. v1, v2, v4) as represented on disk.  Versions created in alfresco will simply be created in the order in which they are found with no gaps (if any are found on disk).  This also implies that the user has the option to number v1, v2, etc or v01, v02, etc.  Be cautious though, if versions are named v1, v10, v2 will be imported in that order, so the proper names for the extensions should be v01, v02, v10.

Implementation Details

The current implementation is checked into a branch in the google code project here:  http://alfresco-bulk-filesystem-import.googlecode.com/svn/branches/versioning/.  The existing bulk-filesystem-import-web-scripts-context.xml file checked into the branch includes the relevant configuration which I’ve listed here.

The existing importer class is redefined to point to the versioning importer, and the versioning importer is defined. In this instance the async importer is used, however the synchronous implementation works as well.

  

  
    
    

  

  
    
    

  

Note that the importers refer to versioning-metadata-loaders which are defined as follows. This bean refers to a new metadata loader for versioned properties:

 
    

        
        
        
      
    
  

  
    
  

As you can see, there are two new classes which make up the bulk of implementation: VersioningImporter and VersioningPropertiesFileMetadataLoader. There are also some modifications to the status implementation to include number of versions created. These two classes drive the import operation. In a future version, the functionality could be merged directly into the existing abstract importer and properties metadata loader.

If you needed to include version history or modify other version properties, just include them in the metadata.properties.v for your version. For example:

  cm:versionLabel=Fixed typo.

I’ve found this implementation useful and have used it successfully to import versioned content. If you have a need to import multiple versions this way could be used. Likewise if you read my previous post, you can see how the Open Migrate writer for the Bulk Filesystem Import Tool could be extended to write out versioned content.

I mentioned in the post background that I learned more about the Alfresco API and its nuances.  What that boiled down to was the behavior in calling create version.  This may be obvious to most but since it was new to me, here’s what I found.  It is necessary to call the createVersion API after creating a document to create the first version. This wasn’t obvious to me and I ended up initially writing code that created a document and set properties, then created a version with the second version’s properties, inadvertently overwriting the first version’s properties. The control flow should be:

// Version 1
NodeRef doc = fileFolderService.create(parentNodeRef, name, typeQName).getNodeRef();
versionService.createVersion(doc, versionProperties);
// Version 2...n
versionService.createVersion(doc, versionProperties);

Next Steps

I’m working together with Peter Monks (@pmonks) to incorporate this into the Bulk Filesystem Import Tool. I’ve checked the code into a branch in the google code repository for the tool here. The work isn’t yet ready for a multi-threaded execution of the tool, but Peter is refactoring for that and we’ll merge the overall implementation into the base classes I’ve extended. We’re also adding support for metadata only versions (e.g. on disk you can just represent a version by a metadata.properties file without a supporting binary.)

Alfresco Content Migration

Background

On a recent project I have spent some time investigating and implementing a content migration solution from a legacy content system into Alfresco.  After researching numerous alternatives, I found that a combination of two popular technologies with a small extension produced a high performance migration solution.  The project had a relatively straightforward requirement:

Inject content into Alfresco from a source content management system.  Take into account performance impacts on the existing and target system, as well as the overall cost of the migration.  The source content management system has millions of content documents to migrate.  Migration may be batched based on business requirements.

This article is focused on the ingestion of content into Alfresco.

Technologies

Various technology options were available to perform the content import.  I spent some time scouring the web looking for migration tools and technologies as well as considering direct access to the source and target system.  It quickly became obvious (partly due to Peter Monks’ blog post on one approach) that there were a handful of options available:

  1. Direct API calls to Alfresco.  Using CMIS, ReST, SOAP, etc.
  2. Alfresco’s JLan Server.
  3. Alfresco Content Import using ACP Files.
  4. Open Migrate (from Technology Services Group)
  5. Bulk Filesystem Import (from Peter Monks)

After carefully considering the options and understanding the client’s needs, I arrived at really three options:  Direct API calls, Open Migrate or Bulk Filesystem Import.  Direct API calls would require custom code for a migration and seemed like a fairly high cost for the value add.  We were also concerned about any kind of remote API calls.  That left us with the two tools.  It is worth explaining how these two tools work:

Open Migrate is a configurable extensible framework for content migrations.  The basic flow is to retrieve content from the source system, map it to the target representation and deliver the content to the target system.  The mapping activity is configurable and allows you to map and modify content as appropriate during a migration process.  The tool has the possibility to be a complete end-to-end solution for many content migrations.  The out of the box implementation for an Alfresco target calls the Alfresco remote API’s and transfers content over the wire during import.  On the surface this isn’t a huge deal, but for a large volume migration this could be a performance barrier.

The Bulk Filesystem Import tool is a highly specialized tool designed to import content in a set of folders and files on a local filesystem into Alfresco.  The tool runs in process (it is deployed with Alfresco) and therefore does not have any over the wire performance implications.  The tool is simply pointed to a given folder hierarchy and content and folders within the filesystem are replicated in Alfresco.  Properties specified in any optionally accompanying properties files can specify the content’s type, aspects and metadata property values.  The format required by the tool for metadata properties is a simple properties file and while perhaps somewhat limited, would meet our needs based on an analysis of the content in the source system.  Peter claims a ~20x performance improvement over a CIFS approach to migration.  This got my attention and helped validate my concern about remote API calls, specifically around sending file contents over the wire.

Decision Time

Open Migrate was the first choice from a cost perspective.  We hoped to have little to no effort to configure the tool and execute a simple migration.  Using the Bulk Filesystem Import tool was a reasonable alternative but would require a translation of content from the source system to the format understood by the import tool.  The source system’s content mapped reasonably easily to Alfresco so this didn’t seem like too large of a hurdle.  Thus, I was intrigued by the idea of combining the possibility of a good end-to-end framework with the high performance of Bulk Filesystem Import.  I explored how I could develop a Bulk Filesystem Import folder structure as a target in Open Migrate.

Implementation

I configured Open Migrate with a “simple migration target” which is a container that allows extension by writing listeners.

    <bean id="MigrationTarget"
        class="com.tsgrp.migration.target.SimpleMigrationTarget"
        scope="prototype">
        <property name="eventListeners">
            <list>
                <ref bean="AlfrescoBulkFileSystemImportWriter" />
            </list>
        </property>
    </bean>

Then I configured a target listener to perform the actual writing to disk. Note, targetDir is a property specified in the Open Migrate properties configuration and is the top level output directory from the Open Migrate process – where you’ll ultimately point the Bulk Filesystem Import tool.

    <bean id="AlfrescoBulkFileSystemImportWriter"
        class="com.ziaconsulting.migration.event.target.AlfrescoBulkFileSystemImportWriter"
        scope="prototype">
        <property name="targetDir" value="${targetDir}" />
    </bean>

The implementation of the listener is fairly straightforward. Each target migration node in Open Migrate has already been properly populated with the desired attributes (metadata properties). The files need to be laid into the directories and the metadata properties need to be written to a format prescribed by the Bulk Filesystem Import tool (e.g. filename.metadata.properties).  It should be noted that due to issue 19 in the Bulk Filesystem Import tool, dates are handled specially.  Also see ISO-8601.  In this migration I’ve only handled single-valued dates as noted below, and String properties.  The properties writing is accomplished as follows:

    private void createNodeProperties(MigrationNode node) {
        Properties props = new Properties();

        for (String attr : node.getAttributeNames(false)) {
            if (attr.startsWith("migration_info_node_")) {
                // Skip this attribute, it's an open-migrate migration detail,
                // not represented in Alfresco.
                continue;
            }
            NodeAttribute nodeAttr = node.getAttribute(attr);

            String value = EMPTY_STRING;

            if (nodeAttr.getDataType().getJavaTypeName().equals(Date.class.getName())) {
                // TODO Doesn't handle multi-valued date properties
                if (nodeAttr.getFirst() != null) {
                    // Only store date fields which have a value
                    Date date = (Date) nodeAttr.getFirst();
                    props.setProperty(attr, ISO8601DateFormat.format(date));
                }
            } else {
                if (nodeAttr != null && nodeAttr.getFirst() != null) {
                    value = node.getAttribute(attr).valuesToString(DELIM);
                }

                props.setProperty(attr, value);
            }
        }

        if (props.size() == 0) {
            // If a node has no properties, don't write the file.
            return;
        }

        // Helper method to get the file path based on targetDir and the node's folder.
        String targetFullFilePath = PathHelper.getContentNodeFilePath(getTargetDir(), node) + ".metadata.properties;
        logger.debug("Target will create properties file " + targetFullFilePath);

        // Get the file object
        File targetFile = new File(targetFullFilePath);

        try {
            if (targetFile.createNewFile()) {

                FileWriter writer = null;
                try {
                    writer = new FileWriter(targetFile);
                    props.store(writer, null);
                    writer.close();
                } catch (IOException ioe) {
                    MigrationException.throwException(ExceptionType.TARGET_NODE_EXCEPTION, "I/O Exception on File Folder Migration Target", ioe);
                    if (writer != null) {
                        try {
                            writer.close();
                        } catch (IOException e) {
                            // Ignore
                        }
                    }
                }
            }
        } catch (IOException e) {
            MigrationException.throwException(ExceptionType.TARGET_NODE_EXCEPTION, "I/O Exception on File Folder Migration Target", e);
        }
    }

As for laying out the binary files, the exercise is largely left to the reader. In our case, the files were accessible on disk and we performed a copy from the source system to the target location for importing. For each target migration node the code writes out the metadata properties and the associated binary content file.

Running the Bulk Filesystem Import utility is as simple as pointing to the target directory and watching your documents import (quickly!) into Alfresco.