Ga verder naar de inhoud

Capturing data from 3.5-inch and 5.25-inch diskettes

In 2017 and 2018, PACKED vzw received requests from three organizations, Liberaal Archief, ADVN, and HeK (House for Electronic Arts), to capture data from diskettes.

The collection received from Liberaal Archief consisted of 165 Macintosh-formatted 3.5-inch diskettes and 30 MS-DOS formatted 5.25-inch diskettes. ADVN provided us with 14 5.25-inch diskettes that had been used on a Windows/MS-DOS computer. At HeK, it involved the artworks Raoul A. Pictor cherche son style (1993) by Hervé Graumann and Über Sehen (1993) by Studer / Van den berg. In total, these were 9 high-density diskettes, some created for Mac and others for Windows.

Most of the diskettes we received from the three institutions were used in the period from the late 1980s to the mid-1990s. The institutions did not have the correct readers to retrieve the data from the diskettes. PACKED vzw developed a workflow to capture the data from diskettes and store it on a contemporary data carrier.

Author

Nastasia Vanderperren (PACKED vzw)

Problem definition

Diskettes are data carriers with a capacity of 80 kB (first generation) up to 2.88 MB (last generation) on which data is stored magnetically. They were ubiquitous in the 1980s until the rise of the CD-R and USB stick at the end of the 1990s/beginning of the 2000s.

Diskettes exist in different types and variants that are not compatible. Many types require their own reader, which cannot write to or read other types of diskettes.1 Diskettes can differ in, among other things:

  • size: The first diskettes, invented in the late 1960s by IBM, had a diameter of 8 inches. For the home computer market, the 5.25-inch floppy disk was introduced in the mid-1970s. From 1988, the 3.5-inch floppy became the most popular medium for data storage. In addition, there were also diskettes of 2, 2.5, 3, 3.25 and 4 inches, but these never fully broke through.
  • the number of tracks and sectors: Data on diskettes is organized in tracks and sectors. Tracks are concentric rings around the center of the disk, with space left in between. That space is not written to. Sectors are blocks of a fixed size (in bytes) and each receive an identification number so that the operating system can locate the data on the disk. Diskettes can differ in the number of tracks per side2, the number of tracks per sector, the number of tracks per inch, and the number of bytes per track.
  • the number of writable sides: there are single-sided (one side) and double-sided (both sides) diskettes. A floppy reader that can only read single-sided diskettes cannot necessarily read double-sided diskettes, and vice versa.
  • density: this concerns the efficiency with which data can be stored on a magnetic carrier. The higher the density, the more data can be stored on a disk. A higher density was achieved through improvements in the encoding for data storage, the magnetic strength for writing data, and the material used. There are single density (SD or 1D), double density (DD or 2D), quad density (QD or 4D), high density (HD), extra-high density (ED), and triple density (TD) diskettes.
  • logical format: the logical format is the scheme that determines how the data is written on the carrier. The most common formats are FM (for single density DOS-formatted diskettes), MFM (for double density DOS-formatted diskettes and high density diskettes), and GCR, which exists in both an Apple and a Commodore variant. In addition, there are also separate formats for Atari and Amiga, among others.

As a result of all these differences, for example, a 3.5-inch floppy drive cannot read every 3.5-inch disk.

The many variants make it challenging to capture data from diskettes. Current USB floppy drives are usually only able to read high-density 1.44MB disks. That was the most popular format from the mid-1990s. Moreover, diskettes are fragile carriers. They are sensitive to dust, condensation, and temperature fluctuations. They should also not be kept near magnets or magnetic devices. Damage can make them unreadable and make data retrieval difficult or impossible.


Figures 1 and 2: a 5.25-inch and a 3.5-inch floppy disk.

Status

In total, we created a disk image for 218 diskettes during the period April-July 2018. These were 204 3.5-inch diskettes and 44 5.25-inch diskettes. In the next phase, the files from the disk images were also extracted and identified.

Method

For capturing the data, we decided to create disk images. Disk images are bit-by-bit copies of the diskettes. Not only the files on the carrier are preserved, but also all system information. In this way, the information on the carrier is copied as completely as possible, and the copy remains as close as possible to the original. Afterwards, you can extract and identify the files from the disk image. Disk images can be made with software that performs checksum checks on both the source (the original disk content) and the disk image (the copy of the original disk).3 This ensures that no errors have occurred in making the disk image, and that the disk image is an identical copy of the original.

The copied carriers were recorded in a spreadsheet with the following columns:

  1. UI (unique identifier): To create the unique identifier, we started from the code assigned by the archive institution to the donation or archive to which the diskettes belonged. This usually consisted of a year followed by a three-digit number. For each carrier, we then added a consecutive three-digit numbering starting at 1 (001). For example, the unique identifier 2008_199_001 refers to the first carrier processed from archive number 2008/199.
  2. Institution: the name of the archive institution, e.g., Liberaal Archief.
  3. Carrier type: the type of disk, e.g., 3.5-inch floppy DS DD.4
  4. Carrier format: the logical format of the diskette. We used the name that the Kryoflux software5 gives to the format, as there are different variants of the same logical format. For example, Apple DOS 400k/800K was used instead of GCR. Both Commodore and Apple each have their own GCR type, and these are incompatible with each other.
  5. Information on the carrier: all information on the label on the disk.
  6. Functional? If the disk image could be opened and files extracted, the disk was considered functional.
  7. Copied without errors? This field indicates whether a disk image could be made without the disk image software indicating errors during reading.
  8. MD5 checksum: An MD5 checksum was created for each disk image.
  9. Notes: Relevant information about the carrier was recorded in this column, e.g., it was a blank disk, not all files could be extracted, or the error messages encountered when opening the disk image.

To prevent our computer from writing files to the external carriers, we used a write blocker. Both 3.5-inch and 5.25-inch disks have a write blocker on the carrier that makes the disk read-only. For the 3.5-inch disk, this is the slider in the lower left corner; for the 5.25-inch, it's the notch at the top right that you need to cover with tape.

Figure 3: write blocker on a 3.5-inch floppy disk.


Figures 4 and 5: the write blocker on a 5.25-inch floppy disk.

As part of the project Resurrection Lab, we collected reading equipment for diskettes. For 3.5-inch disks, we have three USB-connected drives and older disk drives that must be connected with a floppy data cable to a floppy disk controller. For 5.25-inch disks, we have one drive that must be connected with a floppy data cable.

Creating disk images of 3.5-inch diskettes

When testing the USB drives, we found that none of the three drives could read the 3.5-inch double density disks from Liberaal Archief. The equipment was tested on both a 2015 MacBook Pro running Mac OS X El Capitan and a 2017 Dell XPS 13-MLK with Windows 10 and Ubuntu 16.04. However, the equipment worked, because 3.5-inch high density disks could be read. For these high density disks, we could then create disk images using forensic software such as FTK Imager6 or Guymager7. The advantage of this forensic software is that metadata of the capture process is automatically created and saved in a text file and a checksum is generated.

Because it was not possible to use the USB floppy drives for the 3.5-inch disks from Liberaal Archief, we decided to use KryoFlux8. KryoFlux consists of a floppy disk controller9 and software, developed by the Software Preservation Society10, that can read and create disk images from all kinds of 8-inch, 5.25-inch, and 3.5-inch disks. KryoFlux has some advantages over a USB floppy drive:11

  • it can read different disk formats and help users identify the correct logical format. KryoFlux supports 29 formats and can read and capture disks used on Amiga, Atari, older Commodore models, CP/M, Macintosh, and Windows/MS-DOS, among others.
  • when the logical format is not known or found, KryoFlux can create a stream file. KryoFlux then reads the magnetic flux and stores it in a stream file. You can later convert this stream file into a disk image, avoiding the need to reread the disk multiple times if the logical format is unknown.
  • it has a higher success rate when creating disk images from damaged disks.
  • it features a built-in write blocker to prevent the operating system from writing files to the external carrier.12

Figure 6: the KryoFlux floppy controller board.

Because the documentation for KryoFlux is limited, we followed the instructions from the Archivist’s Guide to KryoFlux. This is an unofficial KryoFlux manual written for and by archivists.

To read the disks with KryoFlux, we had to connect a disk drive via a floppy data cable to the KryoFlux board13. The board could then be connected with a USB connection to a modern laptop. We used the 2015 MacBook Pro for this. The board did not work with all disk drives we have. Only after testing with various drives did we arrive at a setup with which the KryoFlux software operated the drive.

Figure 7: floppy data cable (grey ribbon) and molex connectors to power the disk drive.

Figure 8: connecting the disk drive to the KryoFlux board via the floppy data cable.

Figure 9: connecting the KryoFlux board to a laptop via USB.

Figure 10: finally, the power supply is connected to the disk drive.

Figure 11: connecting the power and the floppy data cable to the disk drive.

Figure 12: once the software is started, capturing can begin.

Once the equipment is connected and a diskette is placed in the disk drive, the KryoFlux software can be used to create a disk image. This disk image can then be used to read or export files from the diskette. The software has both a graphical user interface and a command line interface. To create a disk image, you first select the logical format of the diskette; if you do not know the logical format, you can create a stream file, which can then be used to create a disk image.

The 3.5-inch diskettes from Liberaal Archief were Macintosh formatted and double density. From this, we could deduce that these were probably diskettes with Apple GCR as the logical format.14 HeK had two high-density diskettes used on a Macintosh computer. That meant that MFM was probably the logical format.15 The seven high-density diskettes from HeK used on a Windows computer most likely also had MFM as the logical format. To avoid having to read a disk multiple times due to incorrect assumptions, we always first created a stream file, which could then be converted to a disk image using the KryoFlux software. Afterwards, the stream files were deleted.

Figure 13: the KryoFlux software creates a disk image from the diskette. The red color indicates that certain parts of the diskette could not be read due to sector errors. The green color indicates that the track was read correctly.

Figure 14: the orange color indicates that the diskette has been modified, e.g., when a user has deleted or changed files from the diskette.

Figure 15: the software could not read or identify any tracks. The gray color indicates that either the disk has not been formatted yet, or an incorrect logical format was chosen.

Our assumptions were correct. The double density disks were GCR-formatted; the high-density disks were encoded in MFM.

Because KryoFlux, unlike forensic disk image software, does not create a checksum, we created a checksum for each disk image ourselves. We used the command line tool md5 to automate the process.16

In this way, we succeeded in creating a disk image for all 3.5-inch disks. However, we found that only a minority of the disks were copied without errors. Only 44 out of 174 disks (25%) could be captured without errors. 108 of the 174 disks were functional (62%).17

Creating disk images of 5.25-inch diskettes

Unlike 3.5-inch disks, there are no USB-connected readers for 5.25-inch disks. The only option was to find a floppy controller to which an old disk drive could be connected. Therefore, here too, we used KryoFlux to connect a 5.25-inch disk drive to the 2015 MacBook Pro. After creating the disk image, a checksum was created for each image using the command line tool md5.

Figure 16: connecting power and floppy data cable to the disk drive.

Figure 17: setup for capturing data from 5.25-inch disks.

The 5.25-inch disks from ADVN were double density and, except for one, double sided. They were used on a Windows/MS-DOS computer. So they were probably disks with MFM as logical format. The disks from Liberaal Archief were harder to identify because there was not always information on the carrier. Most of the disks stated double density, but for some, the density was unknown. In addition, we did not know whether they were used on an Apple or MS-DOS computer. As a result, the disks could have FM, MFM, or Apple GCR as logical format.

From the results of the Kryoflux software, we found that all disks were MFM encoded. However, we encountered a peculiarity. For each odd track, the software indicated that the track was unformatted. The Archivist’s Guide to KryoFlux indicated that this means that the disks consist of 40 tracks instead of the usual 80. By changing the profile in the KryoFlux settings, we could capture the disks correctly.

Figure 18: every other track, the software cannot identify the format.

Figure 19: adding a new profile to the KryoFlux software.

Figure 20: the 40-track disk was successfully captured.

Finally, there were three disks from Liberaal Archief for which KryoFlux indicated the formatting was incorrect. Even after testing all other logical formats built into the KryoFlux software, we could not achieve a good result. After some research on the KryoFlux forum, we found that these could be high-density disks created at a rotation speed of 360 RPM. That also meant they had 80 tracks instead of 40.18 After setting this in the software, we succeeded in creating an image from one of the three disks. For the other two, the logical format still seemed incorrect. Strangely enough, the disk image we created from one of those two was functional, and we managed to extract the files from it. So it is probably a formatting with a rotation speed of 360 RPM, but there may be other parameters that need to be adjusted.

Figure 21: for three disks, the logical format is not correct.

Figure 22: at a rotation speed of 360 RPM, we can create a disk image of one disk.

Figure 23: the logical format of the disk appears to be incorrect. The yellow box with an ‘X’ indicates a mismatch.

In this way, we managed to create a functional disk image for 43 of the 44 5.25-inch disks (98%). In 29 of the 44 cases (66%), capturing was error-free. Sometimes the equipment stopped working and we had to disconnect and reconnect it to get it working again.

Conclusion

In total, we processed 218 diskettes, of which 174 were 3.5-inch disks and 44 were 5.25-inch disks. Of those 218, we were able to create a good disk image from 216. With the USB floppy drive, we could only capture the nine high-density disks. For the Macintosh double-density 3.5-inch disks, we had to use KryoFlux. Because KryoFlux supports different logical formats, it was a very useful tool for capturing different types of disks. With it, we could create disk images of all Macintosh-formatted 3.5-inch disks. We were also able to create a disk image from 42 5.25-inch disks. For two, we did not manage to find the correct formatting.

Functional Copied without errors Non-functional Not copied without errors
3.5 inch
ADVN 0 0 0 0
HeK 9 9 0 0
Liberaal Archief 97 35 68 130
Subtotal 106 44 68 130
5.25 inch
ADVN 14 7 0 7
HeK 0 0 0 0
Liberaal Archief 29 22 1 8
Subtotal 43 29 1 15
Totals
ADVN 14 7 0 7
HeK 9 9 0 0
Liberaal Archief 126 57 69 138
Total 149 73 69 145

In 149 cases (68%), the disk image was functional, and only for 73 specimens (34%) was the creation of the disk images error-free. For the other disks, there were always sector errors when reading the carrier. The difference in success rate between the 3.5-inch and 5.25-inch disks is striking. Of the 3.5-inch disks, only 61% are functional and 25% captured without errors, while for the 5.25-inch it is 98% and 66%.

Creating disk images completed the first phase of capturing. The disks are now preserved as digital files, but they cannot yet be accessed on every computer. Disk images with a Macintosh file system cannot be opened natively on a Linux or Windows computer. Also, if the disks contain an obsolete file system, they cannot be opened with a modern operating system.

Literature

References

  1. A non-exhaustive list of floppy disk types: https://en.wikipedia.org/wiki/List_of_floppy_disk_formats
  2. The most common number of tracks is 40 or 80.
  3. This can be done with software such as Guymager and Isobuster
  4. DS stands for double sided, DD for double density.
  5. Kryoflux is software for capturing data from diskettes.
  6. For more info, see: http://marketing.accessdata.com/ftkimager4.2.0 and https://accessdata.com/product-download/ftk-imager-version-4.2.0
  7. For more info, see: http://guymager.sourceforge.net/
  8. For more information, see https://kryoflux.com/.
  9. Hardware that enables a computer to communicate with a floppy disk drive.
  10. For more information, see http://softpres.org/
  11. Archivist’s Guide to Kryoflux: https://github.com/archivistsguidetokryoflux/archivists-guide-to-kryoflux
  12. Some operating systems tend to write extra (invisible) files to external drives, such as thumbs.db on Windows or .DS_Store and AppleDouble files (files beginning with ‘._’) on Mac.
  13. For more information, see https://kryoflux.com/?page=kf_features.
  14. In KryoFlux, this profile is called Apple DOS 400k/800k sector image.
  15. For more information, see https://en.wikipedia.org/wiki/Disk_density and https://en.wikipedia.org/wiki/List_of_floppy_disk_formats.
  16. For more info, see: https://www.fourmilab.ch/md5/
  17. By ‘functional’ we mean that the disk could be read and files extracted. But that does not necessarily mean error-free.
  18. For more information, see https://en.wikipedia.org/wiki/List_of_floppy_disk_formats#Logical_formats

Contact details

Nastasia Vanderperren: nastasia.vanderperren@meemoo.be

Deze pagina is laatst aangepast op 04 november 2025

Deze pagina aanvullen of corrigeren?

Foutje gespot? Of heb je aanvullende inzichten? Deel je ervaringen via onderstaande knop.