A Metadata Blunder: Why does this organisation has to copy & paste 20,000 records manually…

… and what lesson can you learn from here?

 

An organisation is going to streamline their operations by reducing some software systems that are used for managing their collections of images, manuscripts, research papers, audio and video recordings. They decide to migrate all data and digital files into a Digital Asset Management (DAM) system that was installed by DIW two years ago.

DIW was invited to propose a solution to batch migrate these records automatically, particularly the text metadata. In other words, they want to avoid having to copy and paste the 20,000 records from one system to another. Yes, who wants to do that?

Unfortunately, after evaluating the data structure and the digital files that are kept in piles of CDs, I have to tell them the sad news that they have to copy and paste the 20,000 records manually.

What could they have done right at the beginning when they started to use a software to manage their digital collections? The points described below are not just applicable if you are using a software system, but also applicable if you are using Excel for metadata entry.

Take these three records as an example.

ABC.1.1 1 Jan 98 manuscript A letter from Mr. President
ABC.1.2 2/3/2011 photograph An opening ceremony
ABC.1.3 Mar 3, 2012 video An interview footage

 

1. Are the fields distinguishable?

Are you able to tell how many different fields are there in each record? As a human being, you definitely can identify the four different fields in each record:

Reference number, Date, Media Type, Title

However, a computer software is not as clever as the human brain (yet). It needs some helps to know that there are four fields in each record. So, let’s give it a hand by inserting a separator between the fields.

Comma, semi colon, are not a good separator because they may already exist in the data. Choose a unique character that is very unlikely to be used in any of the fields as data. My personal favorite is the pipe character “|”.

ABC.1.1 | 1 Jan 98 | manuscript | A letter from Mr. President
ABC.1.2 | 2/3/2011 | photograph | An opening ceremony
ABC.1.3 | Mar 3, 2012 | video | An interview footage

However, the ideal solution is to create a field in the software system specifically for user to enter the corresponding data. If you are using Excel, that means putting each field data into a cell respectively.

Reference no. Date Media Type Title
ABC.1.1 1 Jan 98 manuscript A letter from Mr. President
ABC.1.2 2/3/2011 photograph An opening ceremony
ABC.1.3 Mar 3, 2012 video An interview footage

 

2. Which file is associated to which record?

Each of these records is associated to a digital file.

ABC_01_001.pdf
ABC_01_002.jpg
ABC_01_003.avi

As a human being, you can see that the file ABC_01_001.pdf is associated to the first record, ABC_01_002.jpg is associated to the second record and so on. Again, the dump computer cannot do that! We need to help it by telling the software explicitly which file is associated to which record. A filename field must  be created for each record.

Filename Reference no. Date Media Type Title
ABC_01_001.pdf ABC.1.1 1 Jan 98 manuscript A letter from Mr. President
ABC_01_002.jpg ABC.1.2 2/3/2011 photograph An opening ceremony
ABC_01_003.avi ABC.1.3 Mar 3, 2012 video An interview footage

Oh yes, the filenames in the field must match exactly to those filenames in the CDs, even for the file extension!. Do not expect the software to know that ABC-1-1 is the same as ABC_01_001.pdf. Though case mismatch is generally not an issue, it is still a good practice to match the file names exactly.

 

3. What date is it?

Look at the dates, do you think the computer software can tell that 1 Jan 98 is earlier than Mar 3, 2012. Well, maybe there is really such a smart software out there but many of them cannot differentiate the dates.

The keyword here is consistency. Standardise the date format for all the records.

Filename Reference no. Date Media Type Title
ABC_01_001.pdf ABC.1.1 1 Jan 1998 manuscript A letter from Mr. President
ABC_01_002.jpg ABC.1.2 2 Mar 2011 photograph An opening ceremony
ABC_01_003.avi ABC.1.3 3 Mar 2012 video An interview footage

 

Facebooktwittergoogle_pluslinkedin