Background
I have recently blogged about integrating Open Migrate and the Alfresco Bulk Filesystem Import Tool. As part of that exercise I also spent time with a colleague to implement support for importing versioned content into Alfresco. Once again this was already supported in Open Migrate’s direct to Alfresco implementation, but was not already present in the Bulk FileSystem Import Tool. Background on why I’ve chosen to combine these two tools is in my previous post. The implementation was ultimately fairly straightforward. Along the way I learned more about the Alfresco API, some of its nuances and how to implement this feature in a backwards compatible manner that we could contribute back to the community.
Design Decisions
The initial design goal was simply to import versioned content from disk into Alfresco. While that sounds good in theory it does present a problem. Namely, how should we represent the content on disk? One option would have been to enable the user to provide a separate directory structure for your versioned content and configure the tool to have a separate versioning importer. However, it was my preference to use the same directory structure for both versioned and non-versioned content.
The user is responsible for providing versioned content files if required. The current file format is simply a directory optionally with subdirectories and each with content files and optionally a content file with the extension of metadata.properties to supply metadata.
To support versioned content, the user may optionally specify files with a new extension. Any file ending with the pattern v[0-9]*. This applies to both content and metadata properties files. Versions are imported into Alfresco as follows:
Find all files in a given series, for example:
- Head Revision
- manual.pdf
- manual.pdf.metadata.properties
- First Version
- manual.pdf.v1
- manual.pdf.metadata.properties.v1
- Second Version
- manual.pdf.v2
- manual.pdf.metadata.properties.v2
Note that the head revision of the file is not appended with a version identifier. This allows for backwards compatibility with any existing file structures used with the tool. A file without any versions will simply be created just like the tool has worked to date. Files with version extensions (and their associated metadata pieces) will be used to create the document and associated subsequent versions until the final head revision has been created.
The sorting is a simple alphabetical sort on the versioned extension. This allows for gaps in version history (e.g. v1, v2, v4) as represented on disk. Versions created in alfresco will simply be created in the order in which they are found with no gaps (if any are found on disk). This also implies that the user has the option to number v1, v2, etc or v01, v02, etc. Be cautious though, if versions are named v1, v10, v2 will be imported in that order, so the proper names for the extensions should be v01, v02, v10.
Implementation Details
The current implementation is checked into a branch in the google code project here: https://alfresco-bulk-filesystem-import.googlecode.com/svn/branches/versioning/. The existing bulk-filesystem-import-web-scripts-context.xml file checked into the branch includes the relevant configuration which I’ve listed here.
The existing importer class is redefined to point to the versioning importer, and the versioning importer is defined. In this instance the async importer is used, however the synchronous implementation works as well.
Note that the importers refer to versioning-metadata-loaders which are defined as follows. This bean refers to a new metadata loader for versioned properties:
As you can see, there are two new classes which make up the bulk of implementation: VersioningImporter and VersioningPropertiesFileMetadataLoader. There are also some modifications to the status implementation to include number of versions created. These two classes drive the import operation. In a future version, the functionality could be merged directly into the existing abstract importer and properties metadata loader.
If you needed to include version history or modify other version properties, just include them in the metadata.properties.v for your version. For example:
cm:versionLabel=Fixed typo.
I’ve found this implementation useful and have used it successfully to import versioned content. If you have a need to import multiple versions this way could be used. Likewise if you read my previous post, you can see how the Open Migrate writer for the Bulk Filesystem Import Tool could be extended to write out versioned content.
I mentioned in the post background that I learned more about the Alfresco API and its nuances. What that boiled down to was the behavior in calling create version. This may be obvious to most but since it was new to me, here’s what I found. It is necessary to call the createVersion API after creating a document to create the first version. This wasn’t obvious to me and I ended up initially writing code that created a document and set properties, then created a version with the second version’s properties, inadvertently overwriting the first version’s properties. The control flow should be:
// Version 1 NodeRef doc = fileFolderService.create(parentNodeRef, name, typeQName).getNodeRef(); versionService.createVersion(doc, versionProperties); // Version 2...n versionService.createVersion(doc, versionProperties);
Next Steps
I’m working together with Peter Monks (@pmonks) to incorporate this into the Bulk Filesystem Import Tool. I’ve checked the code into a branch in the google code repository for the tool here. The work isn’t yet ready for a multi-threaded execution of the tool, but Peter is refactoring for that and we’ll merge the overall implementation into the base classes I’ve extended. We’re also adding support for metadata only versions (e.g. on disk you can just represent a version by a metadata.properties file without a supporting binary.)
I’m testing your versioning branch of Bulk Filesystem Import Tool, with Alfresco Community 3.4.d. If I try to import a file with more than one version, the import fails with this error:
2011-04-26 16:34:14,700 ERROR [org.alfresco.extension.bulkfilesystemimport.impl.AsynchronousSingleThreadedBulkFilesystemImporter] Background BulkFilesystemImporter thread threw unexpected exception.
org.alfresco.service.cmr.version.ReservedVersionNameException: The version property name versionLabel clashes with a reserved verison property name.
at org.alfresco.repo.version.common.VersionUtil.checkVersionPropertyNames(VersionUtil.java:80)
at org.alfresco.repo.version.Version2ServiceImpl.createVersion(Version2ServiceImpl.java:224)
….
It seems that the createVersion() doesn’t support the “versionLabel” property.
How can I fix this issue?
Hi Giuseppe,
I haven’t seen this, but it has been a while since I have looked at this code. Another option I think worth trying is the branch where I’ve committed this code into the BFSIT project on google code. Peter Monks has made several revisions there and I think that is closer to what will be the final product. Would you be willing to give this a try using that code? Take a look here:
https://code.google.com/p/alfresco-bulk-filesystem-import/source/browse/#svn%2Fbranches%2Fversioning
-Ryan
FYI in my latest work on the versioning branch of the bulk filesystem import tool I’ve run into the same issue. It appears that the version label property was added to a list of “reserved” properties on the Versionable aspect in 3.4.x, which prevents user code from explicitly setting it when a version is created.
I have no idea why that would be necessary (and obviously it affects the bulk import case, where you may want absolute control over the version labels in the version history), but will try to find out the reasoning and let you know.