Maybe I didn't explain it properly. Please, I beg you to read the whole thing again, there is a "problem" in the code, even if normal pure GPT and hybrid setups are working for everyone. Fixing it, or deciding if it's something that has to be fixed, is not my decision but I want to make sure we are talking about the same thing. That's the least I can do for a project that is allowing me to boot osx. I'll try to extensively explain everything, because I realize talking about assembly can be really confusing. Also because I'm starting to forget the details and want to document it

Notice my only interest here is making sure we are understanding each other while maybe giving light about an obscure possible bug and even more subtle things that show up when trying to patch it.
Regardless of the kind of setup you are using, be it hybrid, only GPT or hybrid/pure_mbr_because_of_parallels, the bug is the same. I'm not discussing how to make a hybrid configuration work, or the difference between boot0 and boot0hfs.
According to my understanding of the source and my testing, there is something that is bypassing an important part of the code. Things can work without it, but it looks like a bug (a design flaw more than a bug), because the code is there and can't be used to make other configurations work (and they should be possible, as the code has already been written).
My first post says:
Due to some misplaced and missing instructions, the second pass in find_boot will never be run.
In find_boot: .continue, a 'jz' instruction is being used to enter checkGPT, and 'ret' to exit. So that 'ret' returns to the beginning, just after "call find_boot" in start_reloc:, giving an error instead of executing the second mbr pass.
The relevant code I'm talking about is (line 359):
.continue:
add si, part_size ; advance SI to next partition entry
loop .loop ; loop through all partition entries
;
; Scanned all partitions but not found any with active flag enabled
; Anyway if we found a protective MBR before we still have a chance
; for a possible GPT Header at LBA 1
;
dec bl
jz checkGPT ; found Protective MBR before
;
; Switching to Pass 2
; try to find a boot1h aware HFS+ MBR partition
;
dec bh
mov si, kMBRPartTable ; set SI to first entry of MBR Partition table
jz .start_scan ; scan again
After scanning all the mbr partitions and not finding anything you get to "dec bl"
bl=1 means you detected a protective mbr (partition with type 0xee). If that's the case, dec bl will set the zero flag, so with a jz instruction you'll enter checkGPT.
There's the problem. checkGPT exits with a ret (line 476), and that returns to line 251:
call find_boot ; will not return on success
error:
LogString(boot_error_str) <- line 251
Now the question is, should it do that? I think it shouldn't. But to be completely honest I don't think it's a bug, more like a design oversight.
The way it is now, boot0 is less powerful than it could be. If the hard disk has a GPT partition table and you didn't find any active partitions in the mbr then you check the GPT. If the GPT doesn't have anything, well, you can't boot.
In the case of boot0hfs, first you check for a hfs partition. If nothing is found and the disk has a GPT partition table, it is checked. If nothing is found, you are out of luck.
I however think the code should switch to Pass2 regardless of the presence of a GPT table. The patch is trivial, but because there is not enough space, it becomes less obvious, and that's why I had to write that long first post. It has no side effects, just adds more power.
For instance, there is 1 setup that can't work without this.
Let's pretend we have a hybrid mbr.
You want to boot native osx, native windows and virtualized windows. Parallels lets you make a virtual machine that uses physical partitions. The virtual machine will see the real disk, with the real mbr, and real windows partitions. But everything else becomes zeroes (including the GPT).
Suddenly, you have a pure mbr setup with no GPT and empty HFS partitions (this is important, it's what makes this scenario special)The difference between boot0 and boot0hfs is just that the former checks for active partitions and then (if there is no gpt) for hfs partitions, while the latter checks for hfs partitions and then (if there is no gpt) for active partitions. The restriction "if there is no gpt" is the thing my patch removes. Once I understood that difference, I just "put the pieces together" so they called each other and things worked fine in my system.
Using boot0 (active partition: hfs):-When booting native osx, it loads the active partition's (hfs) boot sector. Chameleon window shows up and you select osx.
-When booting native windows, it loads the active partition's (hfs) boot sector. Chameleon window shows up and you select windows. It then loads windows partition's boot sector.
-When booting virtualized windows, it tries to load the active partition's (hfs) boot sector. Nothing is found, because GPT (and hfs and everything but windows) is now zeroes. Because it is a protective mbr, checkGPT is called. Again, nothing is found, because GPT is zeroes. Chameleon doesn't load. In this case pass2 won't execute, and it won't matter, that wouldn't fix anything anyway.
So in this situation I prefer to use boot0hfs (active partition: windows):-When booting native osx, it loads the hfs partition. Chameleon window shows up and you select osx.
-When booting native windows, it loads the hfs partition. Chameleon window shows up and you select windows. It then loads windows partition's boot sector.
-When booting virtualized windows, it tries to load the hfs partition. Nothing is found, because GPT (and hfs and everything but windows) is now zeroes. Because it is a protective mbr, checkGPT is called. Again, nothing is found, because GPT is zeroes. In this case, chameleon doesn't load without my patch (or without removing the 0xee mbr entry), because otherwise it won't execute pass2. The patched code will run pass2, and windows active partition will be found. Its boot sector is executed and windows loads. This is exactly how I found out.
There are more ways of doing it, after all it's just the game of chaining bootloaders. I could use boot0 and set the windows partition as active. Then make it so windows shows you a dialog that asks if you want windows or osx. If you select osx, it loads chameleon, that asks you the same (and if you select windows, you go back to the previous step). That would work, but looks like a hack (which I guess is not that bad if you consider this is a hack-intosh, heh). Of course you could configure chameleon so it doesn't show a menu and set a short timeout in the windows boot dialog or something like that to avoid having to look at 2 booting screens. Notice if the mbr didn't have the hfs partition, this wouldn't work as windows' bootmanager wouldn't be able to bootchain chameleon. So yeah, it can work without the patch, although not in the "standard" and "official" setup.
In my opinion the cleanest, easiest and most robust way of doing the parallels virtualized physical windows is the patched boot0, assembled with HFSFIRST (boot0hfs). You lose nothing and "unlock" a feature that was already there, anyway. So this was not about loading this or that operating system, but about executing Pass2, why I think it should be done (at first glance it looked like a bug), how to do it, things I found when doing it, and how I found it through Parallels. I hope it's clear now, I just wanted to share these thoughts, and I also did it because of the challenge to understand what was going on and to make it work.
For future reference, to avoid forgetting it and because it may help, I'm going to post an explanation of the chunks of the code that handle this. You don't have to read it.
Code:****boot0.s:
bind_boot (line 268):
Line 277:
xor bx, bx ; BL will be set to 1 later in case of
; Protective MBR has been found
inc bh ; BH = 1. Giving a chance for a second pass
; to boot an inactive but boot1h aware HFS+ partition
; by scanning the MBR partition entries again.
Depending on bh, you go once or twice through all the mbr partition entries.
bh is always 1, so the idea is doing 2 passes. This is hardcoded. On the other hand setting bh to 0 wouldn't make sense without extra code, because the only pass executed would be pass2, and things would be checked in the inverse order you would expected them to be (because pass2 is the oh-god-panic-nothing-was-found pass). Also, that comment is only true if assembled without HFSFIRST. Otherwise BH = 1 means "giving a chance for a second pass to boot an active partition by scanning the MBR partition entries again".
Line 308:
mov bl, 1 ; Assume we can deal with GPT but try to scan
; later if not found any other bootable partitions.
Then it goes through all the mbr partitions. If one with type 0xee is found, a protective mbr has just been found and it sets bl=1
For each entry in the mbr:
Pass1:-If you assemble with -DHFSFIRST (getting boot0hfs):Line 317:
cmp BYTE [si + part.type], kPartTypeHFS ; In pass 1 we're going to find a HFS+ partition
; equipped with boot1h in its boot record
; regardless if it's active or not.
If partition's type is hfs, try to load its boot sector (it makes sense, because the flag is called HFS FIRST, after all)
-If you assemble without -DHFSFIRST (getting boot0):Line 323:
cmp BYTE [si + part.bootid], kPartActive ; In pass 1 we are walking on the standard path
; by trying to hop on the active partition.
If partition is active, try to boot its boot sector
After Pass1Line 368:
dec bl
jz checkGPT ; found Protective MBR before
If nothing interesting was found in the mbr and bl=1, execute checkGPT
<checkGPT does stuff not relevant to this explanation>
Line 476:
ret ; no more GUID partitions. Giving up.
If nothing is found, checkGPT returns to line 249
Line 248
call find_boot ; will not return on success
error:
LogString(boot_error_str)
An error is shown and it ends.
And this is bad because executing Pass2 in this case would be a nice thing to have (why not?)However, if checkGPT is not executed because bl=0
Line 368:
dec bl
jz checkGPT ; found Protective MBR before
The jz is not followed
Line 371:
;
; Switching to Pass 2
; try to find a boot1h aware HFS+ MBR partition
;
dec bh
mov si, kMBRPartTable ; set SI to first entry of MBR Partition table
jz .start_scan ; scan again
And this is executed. If bh=1, and it is 1 because it's the first time we are here, we start the second pass. So we go back to the beginning, and start going through all the mbr partitions. Again, if one with type 0xee is found, a protective mbr has just been found and it sets bl=1.
Pass2 (it does the inverse of Pass1):-If you assemble with -DHFSFIRST (getting boot0hfs):Line 334:
cmp BYTE [si + part.bootid], kPartActive ; In pass 2 we are walking on the standard path
If partition is active, try to boot its boot sector
-If you assemble without -DHFSFIRST (getting boot0):Line 340:
cmp BYTE [si + part.type], kPartTypeHFS ; In pass 2 we're going to find a HFS+ partition
; equipped with boot1h in its boot record
; regardless if it's active or not.
If partition's type is hfs, try to load its boot sector
After Pass2Line 368:
dec bl
jz checkGPT ; found Protective MBR before
;
; Switching to Pass 2
; try to find a boot1h aware HFS+ MBR partition
;
dec bh
mov si, kMBRPartTable ; set SI to first entry of MBR Partition table
jz .start_scan ; scan again
.exit:
ret ; Giving up.
If nothing interesting is found, we are here again. If it's a protective mbr, bl=1 and checkGPT will be called again. Of course nothing will be found, otherwise we wouldn't be here.
Because this time bh=0, Pass2 will be ignored, and ret brings you to
Line 476:
ret ; no more GUID partitions. Giving up.
checkGPT returns to line 249
Line 248
call find_boot ; will not return on success
error:
LogString(boot_error_str)
Where an error is shown and it ends.