overdue-scratch

Author Topic: Warm boot hangs w. USB HDD present - PARTIAL WORKAROUNDS  (Read 5158 times)

0 Members and 1 Guest are viewing this topic.

abend

  • Entrant
  • Posts: 7
Warm boot hangs w. USB HDD present - PARTIAL WORKAROUNDS
« on: October 29, 2009, 05:29:51 PM »
UPDATE -

There were 2 problems -

 1) On cold boot my dc7800 BIOS presents only the drives configured in the boot order, and on warm boot presents _all_ drives to Chameleon. Very confusing, but nothing to do with Chameleon.

 2) Chameleon hangs when it sees my 1.5TB USB TimeMachine drive. After adding many verbose() calls and a zillion reboots, it turns out Chameleon is looping in ReadBTreeEntry() when probing for the presence of OS X - ie looking for SystemVersion.plist, which BTW doesn't exist on this volume. As long as you're not trying to _boot_ from that disk there are 2 workarounds -

 add 'Scan Single Drive=yes' to your com.apple.Boot.plist

       - or -
 add

    if (strncmp(dirSpec, "hd(x,y)", 7) == 0)
      return -1;

to the beginning of GetDirEntry(), where x,y are the disk and partition number of the drive that is causing the hang. The latter workaround isn't recommended unless you're sure your drive numbers are stable across reboots.


Alas, it's beyond my skill to probe further into why GetDirEntry() doesn't converge, but I'd be happy to assist one of the programmers if they'd like to take this any further.


abend



ORIGINAL POST . . .

First, I'd like to give a huge cheer to the Chameleon team, what an incredible job you've done! I have been happily running for over a year now on 1.0.11 on my HP dc7800 with 2X AHCI-attached SATA drives (GPT). My system also has 2 Seagate USB HDD which are used as Time Machine and Vista Backup targets. I wasn't able to move to 2.0 due to the legacy BIOS issue Rivig recently solved. Now I'm really excited to be on 2.0 RC3!

There is a problem with warm boots though, that wasn't on 1.0.11. Depending on which partition is marked active on a cold boot, the system boots happily to either Chameleon/OS X (disk0s2) or to the Vista boot loader (disk0s3) which then boots Chameleon/OS X as its default.

On a warm boot, however, the system will only boot if the USB drives are powered down. If they are powered up then boot1h executes and chains to Chameleon per usual, but Chameleon hangs after the first 3 or 4 ticks of the 'ActivityIndicator'. The Vista partition will still boot happily, but Chameleon hangs if invoked from there as well.

I  hacked with the boot2 code a bit and it appears that on a cold boot disk80, then 81, then 80 are accessed until Chameleon is loaded. On a warm boot the sequence is 80, 81, 82 (!), hang. Help, please.

Thanks,

abend
« Last Edit: December 05, 2009, 06:38:50 AM by abend »

easternguy

  • Entrant
  • Posts: 5
Re: Warm boot hangs w. USB HDD present - PARTIAL WORKAROUNDS
« Reply #1 on: April 12, 2010, 05:39:57 PM »
abend writes:
Quote
2) Chameleon hangs when it sees my 1.5TB USB TimeMachine drive. After adding many verbose() calls and a zillion reboots, it turns out Chameleon is looping in ReadBTreeEntry() when probing for the presence of OS X - ie looking for SystemVersion.plist, which BTW doesn't exist on this volume. As long as you're not trying to _boot_ from that disk there  are 2 workarounds

Unfortunately, I needed to boot from that disk :)

The symptom of this is boot0 loads normally, the screen clears (video mode change), and you see the progress spinner move a couple of times, then freezes.

Quote
Alas, it's beyond my skill to probe further into why GetDirEntry() doesn't converge, but I'd be happy to assist one of the programmers if they'd like to take this any further.

I found and fixed the problem.  You're correct that ReadBTreeEntry() hangs and never returns. 

It gets into a situation where what it's searching for appears to be before the very first btree entry, which should never happen, so infinite loop.  At least I think that's the explanation.)

But the reason it gets into that infinite loop is that it's getting bad data in its call to ReadExtent().

For normal, clean directories, the btree data lives in the catalog file.  When the directory gets a bit bigger, or has been used a lot, the btree data can overflow into overflow extents.  This is where things go boom.

This might only occur on large drives, where the extent happens to live beyond block 2^32, although I'm not sure.  (Tracking down all possibilities when it involves a drive move, compile, copy, unmount, drive move, boot, for each attempt, can get painful; 50 times last night was enough :)

Anyhow the problem is in hfs.c, in ReadBTreeEntry(), line 736 (for RC4).  This:

Code: [Select]
        ReadExtent(extent, extentSize, extentFile,
                   curNode * nodeSize, nodeSize, nodeBuf, 1);

Should be:

Code: [Select]
        ReadExtent(extent, extentSize, extentFile,
                   (long long) curNode * nodeSize, nodeSize, nodeBuf, 1);

Just cast the multiplication so it's a 64-bit number, as expected by ReadExtent.  (I'm not sure if it's the multiplication that was overflowing in my case, or the fact that subsequent arguments were getting misaligned passing a long where a long-long was expected.)

I suspect hfs.c, GetCatalogEntry, line 606, requires a similar fix.  From:

Code: [Select]
    ReadExtent(extent, extentSize, kHFSCatalogFileID,
              curNode * nodeSize, nodeSize, nodeBuf, 1);

To:

Code: [Select]
    ReadExtent(extent, extentSize, kHFSCatalogFileID,
               (long long) curNode * nodeSize, nodeSize, nodeBuf, 1);

Fixing the first one at least took me to the boot menu; I fixed the second one before moving to the original machine and trying a full boot, so I'm not sure if the second fix is required; it very likely is, and I'd recommend it.

I wouldn't be surprised if this were a source of grief for a lot of mysterious Hackintosh hangs, especially as drives get bigger.

My girlfriend's hack had been working great for months; I installed the Dev tools, which stopped it from booting; after removing a few CHUD drivers it added, things worked again.  Then I installed fink, and things refused to boot!  Turned out it was just the root directory getting into overflow extent mode, which could happen to anyone at any time, unexpectedly, causing the above infinite loop in chameleon.

Whew!

Attached is a boot2 with these two patches (it also includes a similar fix to allow wake-from-sleep with 4g of memory).  No guarantees, hopefully it will help some.

Enjoy!

-d

Azimutz

  • VoodooLabs
  • Posts: 420
  • Paranoid Android
Re: Warm boot hangs w. USB HDD present - PARTIAL WORKAROUNDS
« Reply #2 on: April 14, 2010, 03:39:34 PM »
Hi.. There are a lot of changes/improvements on the trunk version of Chameleon on the repo, maybe these problems are addressed. For instance, @easternguy the first bug you mention: http://forge.voodooprojects.org/p/chameleon/source/tree/HEAD/trunk/i386/libsaio/hfs.c#L717
Maybe you guys should take a look :)

Cheers

Oupss: wrong ReadExtent() :P... sorry.
Nice going anyway :)
« Last Edit: April 15, 2010, 04:17:03 PM by Azimutz »
 System & Patches: http://goo.gl/i961
 Chameleon:
- trunk builds: http://goo.gl/9G1Hq
- pref pane: http://goo.gl/OL2UT

zef

  • Administrator
  • Posts: 265
Re: Warm boot hangs w. USB HDD present - PARTIAL WORKAROUNDS
« Reply #3 on: April 15, 2010, 04:06:38 PM »
I found and fixed the problem.  You're correct that ReadBTreeEntry() hangs and never returns. 

It gets into a situation where what it's searching for appears to be before the very first btree entry, which should never happen, so infinite loop.  At least I think that's the explanation.)

But the reason it gets into that infinite loop is that it's getting bad data in its call to ReadExtent().

For normal, clean directories, the btree data lives in the catalog file.  When the directory gets a bit bigger, or has been used a lot, the btree data can overflow into overflow extents.  This is where things go boom.

This might only occur on large drives, where the extent happens to live beyond block 2^32, although I'm not sure.  (Tracking down all possibilities when it involves a drive move, compile, copy, unmount, drive move, boot, for each attempt, can get painful; 50 times last night was enough :)

Just cast the multiplication so it's a 64-bit number, as expected by ReadExtent.  (I'm not sure if it's the multiplication that was overflowing in my case, or the fact that subsequent arguments were getting misaligned passing a long where a long-long was expected.)

Enjoy!

Hi easternguy,

Thx for fixing this annoying bug! :)

Just committed your proposed changes into the repo:

http://forge.voodooprojects.org/p/chameleon/source/commit/139/

Bye,
zef
ASUS P8Z68-V PRO/GEN3 | i5-2500k | 16GB RAM | GTX560 | Keyboard | Mouse | Devilsound DAC

abend

  • Entrant
  • Posts: 7
Re: Warm boot hangs w. USB HDD present - PARTIAL WORKAROUNDS
« Reply #4 on: April 18, 2010, 08:56:38 PM »
Thanks so much for this, easternguy. Sadly, the fix doesn't work for my system. I tried patching/recompiling the RC4 source as well as your compiled version but it still hangs as you described. I will look deeper into this again, but it won't be for a couple of weeks as I only get time on the computer on sufferance these days - my son is hogging it with some lame excuse about finishing up his term in organic chem.

abend

PS what compiler magic did you use to get your version to be half the size of the distribution?

easternguy

  • Entrant
  • Posts: 5
Re: Warm boot hangs w. USB HDD present - PARTIAL WORKAROUNDS
« Reply #5 on: April 19, 2010, 12:30:44 AM »
Abend: I didn't test the version I compiled up and posted here, as I had a few other fixes specific to my system (the CPU-ID code was causing a repeated reset).

The version I posted here, was just straight from the sources, with my two patches, and a "make".  Nothing exotic.

But I haven't tried it myself due to the other personal patches I had to apply.  It was compiled on 10.6.2, perhaps that makes some difference between my version and the distro (although I'd be surprised if the difference were that drastic).

It's odd yours hangs in a similar way, but the patch doesn't help.  All I can suggest is copious printf's (or verbose()) statements to show the flow through the code, and carefully watching as they flow by (the Pause key can help to watch these go by, as well as some well placed sleep(1); statements to slow down the process). 

It's a fairly convoluted flow at times (rescursing and such), but if you log each entry to various functions, and all exit's, at some point you'll spot a function from which a return never occurs.  If you see any uint_64's or "offset"-type variables in that function, there's a reasonable chance you're seeing a similar 32-bit overflow/casting problem.

Good luck!


easternguy

  • Entrant
  • Posts: 5