Exporting files from disk images of diskettes from the 1980s and 1990s
In 2017 and 2018, PACKED vzw received requests from three organizations (Liberaal Archief, ADVN, and HeK (House for Electronic Arts)) to capture data from diskettes. The carriers originate from the period late 1980s to mid-1990s. For each diskette, a disk image was created. To make the files usable and readable, the disk images were further processed: the file system was identified and the files were extracted and identified from the disk images.
Author
Nastasia Vanderperren (PACKED vzw)
Problem statement
In a previous phase, we created disk images from the 218 diskettes of Liberaal Archief, ADVN, and HeK. Disk images differ from regular file copies in that all information from a single carrier is stored in one digital file. In a disk image, not only the files from the carrier, but also all system information is preserved, while with regular copies, only the files and folders from a carrier are transferred to another location. To read or use the files and folders from a disk image, you must connect or mount the disk image on your computer. This can be risky because some operating systems write (invisible) files to the connected storage media. Sometimes it is also not possible to mount a disk image due to its file system. File systems are software arrangements of a storage medium (e.g., a hard disk or an external carrier) that the operating system uses to present the data on the medium as files and to be able to use them in applications. There are file systems that can only be used on a certain operating system, as well as file systems accessible on multiple operating systems.To ensure that the collection-managing institutions have access to the files on the disk images, the files were exported and identified. Following the Resurrection Lab project, PACKED vzw designed a workflow for identifying data on disk images. The disk images were a suitable case to test this workflow.
Status
During the period April - July 2018, we processed 218 disk images. From 149 disk images, we were able to extract the files. This resulted in 3166 files. The identification tools were able to identify the file formats of 1155 files. Of the 2011 files for which the tools could not identify the formats, we could guess the file format of 1007 files based on metadata. 1004 files remained unidentified.
Method
We followed the article Digital Archaeology and/or Forensics: Working with Floppy Disks from the 1980s by John Durno. He created disk images from diskettes in all kinds of formats (e.g., PC/MS-DOS, Classical Macintosh, AppleDOS, Kaypro CPM, and Atari) and exported the files from the disk images. He used a series of command line tools and emulators, software that makes it possible to simulate a computer system in a new or own environment.
Determine the file system
Before we could use command line tools to export files from the disk images, we needed to know which file system the disk images had. The choice of tool depends on the file system. This information is also necessary if you want to open the files in an emulation environment. Based on the file system, the appropriate emulation environment can be chosen.
For diskettes used in MS-DOS/Windows and Classical Macintosh, the most commonly used file systems are FAT12 and HFS. FAT is a file system developed for MS-DOS and Windows, with FAT12 specifically for diskettes. It is widely supported, e.g., by almost all modern operating systems (Windows, Mac, and Linux). HFS is an obsolete file system developed by Apple and used for diskettes and hard drives. HFS disk images can only be read on Mac (both classic Macintosh and modern OS X/macOS).
To determine the file system, we used disktype. This is a command line tool that can be used in UNIX environments such as Linux or Mac, or via Cygwin1 or WSL on Windows, to determine the file systems of a disk or disk image. With the command disktype image.img > disktype.txt, we wrote the info to the text file disktype.txt for the disk image named image.img.
Copy files from the disk images
It is safer not to mount the disk images directly on the computer. Operating systems such as Windows and macOS tend to write extra (invisible) files.2 Therefore, we used various tools to export files from the disk images.
For each disk image, we created an index file listing all files from the original diskette with their size and last modification date. This file was named 'index.txt'. Then we copied all the files from the disk image to a folder called 'content'. The procedure differs per type of disk image.

Exporting files from FAT12 disk images
There are several tools you can use to copy files from FAT12 formatted disk images. Like in John Durno's article, we used mtools. This is a set of command line tools that allows Mac or Linux computers to work with MS-DOS volumes.
We used the following commands for the disk image named image.img:
-
mdir -/ -a -i image.img > index.txt: this command lists all files from the disk image, including hidden files, and writes them to the file index.txt. -
mcopy -psmi image.img ::* content: this command copies all files and folders from the disk image to the folder called ‘content’. The original properties and last modification date of the files are preserved.

Another tool we tested is BitCurator Disk Image Access. BitCurator is a specialized version of Ubuntu consisting of a collection of forensic tools to help archivists and librarians preserve data on external carriers. BitCurator Disk Image Access is software that allows you to export all files from a disk image. You can even see and export all deleted files on the disk. The deleted files on the disk images were mainly temporary tmp files. Some disk images also appeared to come from diskettes that first contained computer games, which were later overwritten. We chose not to export the deleted files because they did not seem relevant or useful. Temporary tmp files, for example, can no longer be opened.

Unlike mcopy, BitCurator Disk Image Access also allows you to export hidden folders and files. These are usually system files that are not very useful. Nevertheless, if you want to export the entire diskette, Disk Image Access is a better tool. We preferred mtools because we could more easily automate the process using a script.
In this way, we succeeded in copying the files from 54 out of 55 FAT12 disk images. The disk image from which we could not copy files could not be opened with either mtools or Disk Image Access.
Exporting files from HFS disk images
hfsutils
We also used several tools to export the 160 HFS formatted diskettes. The first tool we tested was hfsutils. This is a set of commands that allow you to manipulate HFS volumes on UNIX systems. The commands are similar to those of mtools.
We used the following commands for a disk image named image.img:
-
hmount image.img > hmount.txt: this command ensures that the volume of the disk image is virtually mounted. The metadata is written to the file ‘hmount.txt’. The metadata consists of the volume name, the date the volume was created3, the last modification date, and the number of free bytes. -
hls -ialR image.img > index.txt: this lists all folders and their files, including those that are hidden, in the disk image. The listing is done in ‘long format’. This means that in addition to the files, the file type and creator code are also displayed. In classic Macintosh environments, files were not identified by file extensions, but by the file type (a code for the file type) and the creator code (a code for the application with which the file was created). This is a useful mechanism for identifying files on the disk image. The list is written to the file ‘index.txt’. -
hcopy -m :* image.img content/: the files on the volume are copied along with their resource fork to the folder ‘content’. Resource fork is a mechanism on old and current Mac environments that stores structured data from files, such as icons, the shape of the windows, and the menu of the file. Resource forks can only be used on Mac systems. -
humount: the disk image is virtually ejected.


With hmount we could mount 102 disk images and create the index file. With hcopy we could copy the files from 83 disk images. We found that the index file sometimes contains more files than hcopy exported. With hcopy, we could not copy files that are in subfolders. Sometimes the tool could not copy all files. This may be due to errors on the original diskette. It always concerned disk images from diskettes that could not be captured without errors. A drawback of the hcopy command is that it changes the original metadata of the files, such as creation date and last modification date, to the date and time when the files are copied.
hdiutil
We then tested hdiutil. hdiutil is a command line tool on macOS that can manipulate disk images in all kinds of formats (.dmg, .img, .iso). Initially, we wanted to test whether it could mount disk images that hfsutils could not load and export. Once the volume was mounted with hdiutil, we could copy the files with rsync. rsync4 is a command line tool for copying files from one location to another. The software always checks via checksums whether the copying was successful and can be run in archive mode, which preserves the original metadata of the files. tree, a command line tool for listing files in a tree structure, was used to create an index of the files.
We used the following commands on a disk image named image.img:
-
hdiutil attach -readonly image.img: this command mounts the disk image as a read-only medium. The volume of the disk image appears in the Finder sidebar. hdiutil reports the location where the disk is mounted5 and the volume name. -
rsync -ra /Volumes/image/ content/: all folders and files on the volume named image are copied to the folder ‘content’. rsync is run in archive mode. -
hdiutil detach /dev/disk2: after everything is copied, the disk is ejected.


tree command.With hdiutil and rsync, we could copy files from about five more disk images. We noticed several advantages over hfsutils. They allowed us to export files in subfolders. Moreover, the original creation, opening, and modification date were preserved and we could also export hidden files. We therefore decided to use these two tools for all disk images. However, we noticed that hdiutil could not mount some disk images that hfsutils could mount. For those disk images, we had to continue using hfsutils. Because of the rich metadata of the hmount and hls tools, we continued to use them for all disk images.
HFSExplorer
There were still about seventy disk images from which we could not copy the files. Therefore, we tested a third tool: HFSExplorer. HFSExplorer is similar to Disk Image Access. You can open a disk image and export the files. With this tool, the original modification date is also preserved, and hidden files could be exported.

With HFSExplorer, we could export the files from two more disk images. To preserve the original metadata of the files, we also used the tool for the disk images that hdiutil could not export, but hfsutils could.
Mini vMac
To extract the files from the other disk images, we tested one last tool: Mini vMac. Mini vMac is an emulator that allows you to open classic Macintosh environments from System 1 (° 1983) to System 7.5.5 (° 1996). In that emulator, we could load our disk images as if we were inserting a diskette into a real computer. With the application ExportFl, we could then export the files on the disk image from the emulation environment to the real computer. In this way, we could export the files from about two more disk images.


With the above four methods, we were able to export the files from a total of 94 disk images. In some cases, all four tools had problems reading a disk image, and only part of the files could be exported. The other 66 disk images were presumably created from diskettes that are too damaged and are therefore no longer functional. No tool could read all 94 still-functional disk images. HFSExplorer and hdiutil combined with rsync could export both hidden files and preserve the original metadata. Those files and metadata are missing for the disk images for which we had to use Mini vMac and hfsutils. With hfsutils, we could create the best index files because it loads the files into a virtual HFS environment. We thus not only obtained a list of all the files on the disk image but also the code of the file type and the application with which the files were created. This is useful metadata for identifying the files.
Exporting files from MFS disk images
There is no software to manipulate MFS disk images. To extract files from the disk images, we tried using the emulator Mini vMac. That emulator could mount two of the three disk images, but we could only export the files from one disk image. For the other disk image, Mini vMac crashed every time. Presumably, the two non-functional disk images were created from diskettes that were too damaged.
Identifying the files
When the files had been extracted from the disk images, we started identifying the file formats. We used two tools for this: Siegfried and DROID. Both tools use the PRONOM database for identification. There was little difference between the results of the two tools. The difference was in nuance: sometimes DROID was more certain about the file format of a number of files, while Siegfried was still unsure, and vice versa.
The tools were certain about the file format in only about one out of three cases. A possible explanation is that in the HFS file system, files did not have file extensions or that the files were too damaged due to diskette damage. Because in classic Macintosh environments files were identified with their file code and creator code, this metadata gives us an indication of their file format. We used the codes to identify the files with the online list Signatures of Macintosh Files. In the unidentified files from Liberaal Archief, we mainly see the following codes recurring:
- ALB3/ALD3: stands for Aldus Pagemaker 3.0;
- ALB4/ALD4: stands for Aldus Pagemaker 4.0;
- MACA/WORD: stands for MacWrite 4.6/5;
- MWII/MW2D: stands for MacWrite II.
If we look up these file formats in the PRONOM database, we see that the database does not know some files (MacWrite and MacWrite II) or that it only knows the extension for the files. Since file extensions are not preserved in classic Macintosh environments, it is impossible for DROID or Siegfried to identify the files. You can teach the tools to recognize these formats by creating a custom signature file. We have not (yet) tested this for this case. It is also possible to register a new file format in the PRONOM database. We have not tested this either.
Conclusion
By using various tools, we succeeded in determining the file system of 213 disk images. From 149 of the 218 disk images from diskettes from the 1980s and 1990s, we could extract and identify the files. The 69 disk images for which this was not possible presumably come from diskettes that were too damaged, making the disk images corrupt and no longer accessible. Nevertheless, the chances of success also depend on the tools used. Especially for the HFS disk images, we noticed that one tool could read disk images that another could not. None of the four tools used could read all functional disk images.
| Actions | Number succeeded | Number not succeeded | Total |
|---|---|---|---|
| Determine file system | 213 disk images | 5 disk images | 218 disk images |
| Extract files | 149 disk images | 69 disk images | 218 disk images |
| Identify files | 1155 files (with manual identification: 2162) | 2011 files (with manual identification: 1004) | 3166 files (from 149 disk images) |
File identification succeeded in only a minority of cases. Especially for the HFS disk images, the identification software was unable to analyze the files. This is because files in HFS do not have an extension. The metadata by which HFS identifies files (the creator code and the type code) did allow us to identify the files ourselves. The disk images mainly contained text files of letters and publications written and published by the archive creators. In the case of HeK, these were two artworks that would ultimately have been lost if they had not been extracted from the diskettes.
All files extracted from the disk images, as well as the disk images themselves, were subjected to a virus scan. Diskettes were a popular medium for spreading viruses. These viruses can still be dangerous, so virus control is necessary. No viruses were found.
Literature
DURNO, John, Digital Archaeology and/or Forensics: Working with Floppy Disks from the 1980s, http://journal.code4lib.org/articles/11986
References
- ↑ Cygwin is a collection of free tools intended to run Unix programs on most versions of Microsoft Windows, https://en.wikipedia.org/wiki/Cygwin.
- ↑ HFS can only be read on Mac since OS X 10.6. For MacBooks with modern operating systems, there is no risk of invisible files being written into the disk images.
- ↑ The date the volume was created is the date the diskette was first used/formatted.
- ↑ For more information, see https://en.wikipedia.org/wiki/Rsync.
- ↑ If you have no external drives connected to your computer, this is usually /dev/disk2.
Contact details
Nastasia Vanderperren: nastasia@packed.be
Persistente URI:
https://id.kbde.be/0196fd78-8703-714f-bba2-ce9aa3aa4274Organisatie
Licentie
- CC-BY-SA
Type
Collectie
Expertisedomein
Deze pagina is laatst aangepast op 04 november 2025
Deze pagina aanvullen of corrigeren?
Foutje gespot? Of heb je aanvullende inzichten? Deel je ervaringen via onderstaande knop.
Zie je geen video? Pas dan je cookieinstellingen aan onderaan deze pagina: Cookie policy Klik op ‘verander uw toestemming’ vlak boven de tabel en vink ‘voorkeuren’ en ‘statistieken’ aan.