2

Background

I am maintaining a product that has an Analog Devices ADSP-BF547 Blackfin Microprocessor that is running the ADI's 2011R1-RC3 distribution of uCLinux, which uses Busybox. The processor mounts an SD card which it writes some encrypted TAR file reports to periodically. Note, uCLinux is for processors with no MMU.

Here are some details about the uImage partitions and what is mounted:

root:root:~> cat /proc/mtd
dev:    size   erasesize  name
mtd0: 00100000 00020000 "uboot"
mtd1: 00800000 00020000 "uImage0"
mtd2: 00800000 00020000 "uImage1"~> cat /proc/mtd
dev:    size   erasesize  name
mtd0: 00100000 00020000 "uboot"
mtd1: 00800000 00020000 "uImage0"
mtd2: 00800000 00020000 "uImage1"
NAND device: Manufacturer ID: 0x2c, Chip ID: 0xda (Micron MT29F2G08ABAEAH4)
Creating 7 MTD partitions on "MT29F2G08ABAEAH4":
0x000000000000-0x000000100000 : "uboot"
0x000000100000-0x000000900000 : "uImage0"
0x000000900000-0x000001100000 : "uImage1"
0x000001100000-0x000003000000 : "firmware"
0x000003000000-0x000003400000 : "factory_conf"
0x000003400000-0x000003800000 : "var_conf"
0x000003800000-0x000010000000 : "nand_temp"
yaffs: dev is 32505859 name is "mtdblock3" ro
yaffs: passed flags ""
yaffs: dev is 32505860 name is "mtdblock4" ro
yaffs: passed flags ""
yaffs: dev is 32505861 name is "mtdblock5" rw
yaffs: passed flags ""
yaffs: dev is 32505862 name is "mtdblock6" rw
yaffs: passed flags ""

root:~> mount
rootfs on / type rootfs (rw)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
mdev on /dev type tmpfs (rw,nosuid,relatime,mode=0755,size=10M)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
var on /var type ramfs (rw,relatime)
tmp on /tmp type tmpfs (rw,nosuid,nodev,relatime)
/dev/mtdblock3 on /mnt/firmware type yaffs2 (ro,noatime)
/dev/mtdblock4 on /mnt/factory_conf type yaffs2 (ro,noatime)
/dev/mtdblock5 on /mnt/var_conf type yaffs2 (rw,noatime)
/dev/mtdblock6 on /mnt/nand_temp type yaffs2 (rw,noatime)
/dev/mmcblk0p1 on /mnt/sd type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=cp437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

The Problem

I've gotten a few units back from the field that are failing to boot. U-Boot reports that there is no valid uImage. I read out the memory from U-Boot at the beginning of where the uImage should be and discovered that it has been mostly erased. What has been written appears to be the same contents that get written to the SD card. However, this area does not resemble a FAT filesystem.

I am basing this assumption on:

  1. The the file name present in an ASCII of the dump matches the same conventions we use on the SD card
  2. The string "reports" is in the dump, which is the directory we write to on the SD card
  3. The only ASCII information a report contains is the string REPORTDATA

Here is the beginning of the dump that shows these features:

00000000: 03 00 00 00 01 00 00 00 ff ff 72 65 70 6f 72 74    ..........report
00000010: 73 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    s...............
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
*
000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00000100: 00 00 00 00 00 00 00 00 00 00 ff ff ed 41 00 00    .............A..
00000110: 00 00 00 00 00 00 00 00 06 cb b9 5e 06 cb b9 5e    ...........^...^
00000120: 06 cb b9 5e ff ff ff ff ff ff ff ff ff ff ff ff    ...^............
00000130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00000140: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
*
000001a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000001b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000001c0: ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00    ................
000001d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000001e0: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff    ................
000001f0: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00    ................
00000200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00000210: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
*
000007e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00000800: 03 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00    ................
00000810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00000820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
*
000008e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
000008f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00000900: 00 00 00 00 00 00 00 00 00 00 ff ff ed 41 00 00    .............A..
00000910: 00 00 00 00 00 00 00 00 02 cb b9 5e 06 cb b9 5e    ...........^...^
00000920: 06 cb b9 5e ff ff ff ff ff ff ff ff ff ff ff ff    ...^............
00000930: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00000940: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
*
000009a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000009b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000009c0: ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00    ................
000009d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000009e0: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff    ................
000009f0: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00    ................
00000a00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00000a10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
*
00000fe0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00000ff0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00001000: 01 00 00 00 01 01 00 00 ff ff 30 39 39 39 39 5f    ..........09999_
00001010: 41 42 43 44 30 30 31 36 30 5f 30 34 33 39 36 32    ABCD00160_043962
00001020: 5f 31 33 32 31 2e 74 61 72 00 00 00 00 00 00 00    _1321.tar.......
00001030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00001040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
*
000010e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
000010f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00001100: 00 00 00 00 00 00 00 00 00 00 ff ff 80 81 00 00    ................
00001110: 00 00 00 00 00 00 00 00 08 cb b9 5e 08 cb b9 5e    ...........^...^
00001120: 08 cb b9 5e 00 00 00 00 ff ff ff ff ff ff ff ff    ...^............
00001130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00001140: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
*
000011a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000011b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000011c0: ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00    ................
000011d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000011e0: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff    ................
000011f0: 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00    ................
00001200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00001210: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
*
000017e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
000017f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
00001800: 52 45 50 4f 52 54 44 41 54 41 01 f8 7b 00 00 34    REPORTDATA..{..4
00001810: 00 46 b0 9e 27 97 7d 88 33 da 21 93 2b c8 51 c2    .F..'.}.3.!.+.Q.
00001820: 7e 00 02 80 fb 30 bc 14 9a 86 73 c5 c8 f7 37 5f    ~....0....s...7_
00001830: 79 28 45 c5 53 03 c8 73 91 b9 fe 21 1d f7 41 ac    y(E.S..s...!..A.
00001840: 8c 85 7d f0 d6 c8 c2 9f 07 2d 65 81 d4 2e a5 a0    ..}......-e.....

When we build these reports, we do so using temporary files in RAM (/tmp/). Prior to copying the report to the SD card and giving it an appropriate name we check that the SD card is mounted. Finally we use a system call to the mv command to move the file.

I attempted to unmount the SD card right after we check whether it is present and I've attempted to eject it. I've also attempted to do the same immediately prior to running the mv command. In either case an error message is printed that makes sense, e.g.:

mv: can't rename '/tmp/reportZe1MAL': No such file or directory
ERROR: Failed to move /tmp/reportZe1MAL to /mnt/sd/reports/09999_ABCD00160_044082_1007.tar: exit status was 256

Questions

  1. Does the filesystem in the corrupted area resemble anything, e.g. tempfs?
  2. How could a write to /mnt/sd, which is mmcblk0p1, end up in NAND mtdblock1 which isn't even mounted?

I have not attempted to create an out of ram scenario yet. But it is on my list of things to try.

Edit, here is a clue

I was able to look at some log data within the encrypted tar file that was written to NAND. From this log data I can tell that the device thinks that the SD card is mounted but the directory /mnt/sd/reports does not exist just prior to generating the tar file. I am seeing this consistently across multiple units.

Nick
  • 1,361
  • 1
  • 14
  • 42
  • 1
    1) I don't think so, filesystems and files generally have a signature byte, this has none 2) Good question, I don't think I have an explanation for that. Which begs the question: is this *really* what is happening or is there an invalid assumption somewhere? – gergelykalman Sep 18 '20 at 17:09
  • @gergelykalman thanks for your response. The application thinks it is writing to a path of "/mnt/sd/reports" (it prints this to the log), but its if the SD card became unmounted and somehow NAND partition was mounted in its place. In my recent edit I mentioned that at some point the application spontaneously thinks that the report directory doesn't exist (it uses POSIX stat() for this) and creates the directory. – Nick Sep 18 '20 at 19:12
  • 1
    I read what you wrote. The point was that I find it a lot more likely that you mistakenly think that this got written to the bad partition as opposed to the actual partition being unmounted then another mounted in it's place. – gergelykalman Sep 18 '20 at 19:34
  • @gergelykalman I am not sure I understand your previous comment. But I think you're suggesting that we've actually mistakenly mounted the location in NAND where the uImage resides prior to the write. That's a good suggestion, I'll start looking into it. – Nick Sep 21 '20 at 14:35
  • possibly that, or that you made a mistake somewhere and you think the wrong thing got written to the wrong place when in fact it did not. Filesystems and sd cards can be tricky, even though I don't think you made a mistake like that it is always good to check assumptions when dealing with really crazy and implausible situations. – gergelykalman Sep 24 '20 at 09:35

0 Answers0