Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Vitamin C deficiency is apauling.


devel / comp.arch / Re: RISC-V vs. Aarch64

SubjectAuthor
* RISC-V vs. Aarch64Anton Ertl
+* Re: RISC-V vs. Aarch64MitchAlsup
|+* Re: RISC-V vs. Aarch64Anton Ertl
||`* Re: RISC-V vs. Aarch64MitchAlsup
|| +- Re: RISC-V vs. Aarch64BGB
|| `- Re: RISC-V vs. Aarch64Anton Ertl
|+* Re: RISC-V vs. Aarch64Ivan Godard
||+- Re: RISC-V vs. Aarch64robf...@gmail.com
||+- Re: RISC-V vs. Aarch64MitchAlsup
||`* Re: RISC-V vs. Aarch64Quadibloc
|| `* Re: RISC-V vs. Aarch64Quadibloc
||  `- Re: RISC-V vs. Aarch64Quadibloc
|+* Re: RISC-V vs. Aarch64Marcus
||+- Re: RISC-V vs. Aarch64BGB
||`* Re: RISC-V vs. Aarch64MitchAlsup
|| +- Re: RISC-V vs. Aarch64BGB
|| `- Re: RISC-V vs. Aarch64Ivan Godard
|`- Re: RISC-V vs. Aarch64MitchAlsup
`* Re: RISC-V vs. Aarch64BGB
 +* Re: RISC-V vs. Aarch64MitchAlsup
 |+- Re: RISC-V vs. Aarch64MitchAlsup
 |+* Re: RISC-V vs. Aarch64Thomas Koenig
 ||+* Re: RISC-V vs. Aarch64Ivan Godard
 |||`* Re: RISC-V vs. Aarch64EricP
 ||| `- Re: RISC-V vs. Aarch64Ivan Godard
 ||+* Re: RISC-V vs. Aarch64MitchAlsup
 |||`* Re: RISC-V vs. Aarch64Ivan Godard
 ||| `* Re: RISC-V vs. Aarch64MitchAlsup
 |||  `* Re: RISC-V vs. Aarch64Ivan Godard
 |||   `* Re: RISC-V vs. Aarch64MitchAlsup
 |||    `- Re: RISC-V vs. Aarch64Marcus
 ||`* Re: RISC-V vs. Aarch64BGB
 || `- Re: RISC-V vs. Aarch64MitchAlsup
 |+* Re: RISC-V vs. Aarch64BGB
 ||`* Re: RISC-V vs. Aarch64MitchAlsup
 || `- Re: RISC-V vs. Aarch64Thomas Koenig
 |`* Re: RISC-V vs. Aarch64Marcus
 | `* Re: RISC-V vs. Aarch64EricP
 |  +* Re: RISC-V vs. Aarch64Marcus
 |  |+* Re: RISC-V vs. Aarch64MitchAlsup
 |  ||+* Re: RISC-V vs. Aarch64Niklas Holsti
 |  |||+* Re: RISC-V vs. Aarch64Bill Findlay
 |  ||||`- Re: RISC-V vs. Aarch64MitchAlsup
 |  |||`- Re: RISC-V vs. Aarch64Ivan Godard
 |  ||`- Re: RISC-V vs. Aarch64Thomas Koenig
 |  |+* Re: RISC-V vs. Aarch64Thomas Koenig
 |  ||+* Re: RISC-V vs. Aarch64MitchAlsup
 |  |||`- Re: RISC-V vs. Aarch64BGB
 |  ||+* Re: RISC-V vs. Aarch64Ivan Godard
 |  |||`* Re: RISC-V vs. Aarch64Thomas Koenig
 |  ||| `- Re: RISC-V vs. Aarch64Ivan Godard
 |  ||`* Re: RISC-V vs. Aarch64Marcus
 |  || +* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |`* Re: RISC-V vs. Aarch64aph
 |  || | +- Re: RISC-V vs. Aarch64Michael S
 |  || | `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |  `* Re: RISC-V vs. Aarch64robf...@gmail.com
 |  || |   +* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |   |`- Re: RISC-V vs. Aarch64Tim Rentsch
 |  || |   `* Re: RISC-V vs. Aarch64Terje Mathisen
 |  || |    `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |     `* Re: RISC-V vs. Aarch64Marcus
 |  || |      `* Re: RISC-V vs. Aarch64Guillaume
 |  || |       `* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        +- Re: RISC-V vs. Aarch64Marcus
 |  || |        +* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |`* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        | `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |  `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |        |   `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |    `* Re: RISC-V vs. Aarch64EricP
 |  || |        |     +* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |     |`* Re: RISC-V vs. Aarch64EricP
 |  || |        |     | `- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |     `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |      `* Re: RISC-V vs. Aarch64EricP
 |  || |        |       +- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |       `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        +* Re: RISC-V vs. Aarch64Brett
 |  || |        |        |+* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |        ||`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        `* Re: RISC-V vs. Aarch64Stephen Fuld
 |  || |        |         `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          +* Re: RISC-V vs. Aarch64Stefan Monnier
 |  || |        |          |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          +* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |          |`* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          | `- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |          +* Re: RISC-V vs. Aarch64Stephen Fuld
 |  || |        |          |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          `* Re: RISC-V vs. Aarch64EricP
 |  || |        |           +* Re: RISC-V vs. Aarch64EricP
 |  || |        |           |`* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |           | `* The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |  +- Re: The type of Mill's belt's slotsMitchAlsup
 |  || |        |           |  `* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |   `* Re: The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |    `* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |     +* Re: The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |     |`* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |     `* Re: The type of Mill's belt's slotsMitchAlsup
 |  || |        |           `- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        +* Re: RISC-V vs. Aarch64Guillaume
 |  || |        `* Re: RISC-V vs. Aarch64Quadibloc
 |  || `* MRISC32 vectorization (was: RISC-V vs. Aarch64)Thomas Koenig
 |  |`* Re: RISC-V vs. Aarch64Terje Mathisen
 |  `- Re: RISC-V vs. Aarch64Quadibloc
 +* Re: RISC-V vs. Aarch64Anton Ertl
 `- Re: RISC-V vs. Aarch64aph

Pages:123456789101112131415
Re: RISC-V vs. Aarch64

<sqvrv5$aog$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22714&group=comp.arch#22714

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-eb03-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Mon, 3 Jan 2022 22:05:25 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sqvrv5$aog$1@newsreader4.netcologne.de>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<sqverm$adp$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 3 Jan 2022 22:05:25 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-eb03-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:eb03:0:7285:c2ff:fe6c:992d";
logging-data="11024"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Mon, 3 Jan 2022 22:05 UTC

Guillaume <message@bottle.org> schrieb:
> Le 02/01/2022 à 23:37, MitchAlsup a écrit :
>> Why is there not an IMAX instruction in every modern ISA ??

> The reason is that the RISC-V ISA is very modular, so they have choosen
> to keep the base ISA minimal, and then extend it.
>
> It has benefits of course - you can design cores that are as minimal or
> as featureful as needed, while still being compliant

One thing I do not understand is why multiplication and division
are in one extension.

Given the difference in effort for low-cost mutiplier and a divider,
why is it not possible to have multiplication instructions, but no
division instructions?

Re: RISC-V vs. Aarch64

<sqvsf3$sct$1@gioia.aioe.org>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22716&group=comp.arch#22716

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!UgLt14+w9tVHe1BtIa3HDQ.user.46.165.242.75.POSTED!not-for-mail
From: mess...@bottle.org (Guillaume)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Mon, 3 Jan 2022 23:13:50 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sqvsf3$sct$1@gioia.aioe.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<sqverm$adp$1@gioia.aioe.org> <sqvrv5$aog$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="29085"; posting-host="UgLt14+w9tVHe1BtIa3HDQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Content-Language: fr
X-Notice: Filtered by postfilter v. 0.9.2
 by: Guillaume - Mon, 3 Jan 2022 22:13 UTC

Le 03/01/2022 à 23:05, Thomas Koenig a écrit :
> Guillaume <message@bottle.org> schrieb:
>> Le 02/01/2022 à 23:37, MitchAlsup a écrit :
>>> Why is there not an IMAX instruction in every modern ISA ??
>
>> The reason is that the RISC-V ISA is very modular, so they have choosen
>> to keep the base ISA minimal, and then extend it.
>>
>> It has benefits of course - you can design cores that are as minimal or
>> as featureful as needed, while still being compliant
>
> One thing I do not understand is why multiplication and division
> are in one extension.
>
> Given the difference in effort for low-cost mutiplier and a divider,
> why is it not possible to have multiplication instructions, but no
> division instructions?

The 'M' extension is so "small" that they probably didn't see a point in
further splitting it. But you're right, and that may happen in the
future. Meanwhile, while not being 100% compliant (you can't claim the
'M' extension if you don't implement division), it's already something
that's been taken into account by compilers. So at least with GCC, you
can use an option that allows it to generate only multplication
instructions, but not division instructions (so divisions and modulo
will be emulated in software.)

Re: RISC-V vs. Aarch64

<sr0vhm$c4u$1@dont-email.me>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22735&group=comp.arch#22735

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Tue, 4 Jan 2022 00:12:38 -0800
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <sr0vhm$c4u$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 4 Jan 2022 08:12:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="de80ed7290c9d686a431169b61ac8f21";
logging-data="12446"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/DU/uz9n8OMJp6zgqt/Fif"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:I6h4F70nemDfj6IHr/mHf0pWjFg=
In-Reply-To: <bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Tue, 4 Jan 2022 08:12 UTC

On 1/3/2022 9:28 AM, MitchAlsup wrote:
> On Monday, January 3, 2022 at 4:01:36 AM UTC-6, Ivan Godard wrote:
>> On 1/2/2022 2:37 PM, MitchAlsup wrote:
>>> On Sunday, January 2, 2022 at 12:55:45 PM UTC-6, Guillaume wrote:
>>>> Le 01/01/2022 à 18:35, Marcus a écrit :
>>>>> On 2022-01-01, Thomas Koenig wrote:
>>>>>> Terje Mathisen <terje.m...@tmsw.no> schrieb:
>>>>>>
>>>>>>> Just like x86 condition codes, POWER compilers will probably do a better
>>>>>>> job if they can inline bool functions, so that the condition code can be
>>>>>>> used directly instead of first having to be reified as an int, then
>>>>>>> tested again in the calling function.
>>>>>>
>>>>>> Very much so.
>>>>>>
>>>>>> _Bool gt (long int a, long int b)
>>>>>> {
>>>>>> return a > b;
>>>>>> }
>>>>>>
>>>>>> long int mymax(long int a,long int b)
>>>>>> {
>>>>>> return gt(a,b) ? a : b;
>>>>>> }
>>>>>>
>>>>>> will give you, with -O3 on a current trunk,
>>>>>>
>>>>>> cmpd r3,r4
>>>>>> isellt r3,r4,r3
>>>>>> blr
>>>>>>
>>>>>> for mymax.
>>>>>>
>>>>>
>>>>> The MRISC32 version, https://godbolt.org/z/r6rTj5aWv
>>>>>
>>>>> mymax:
>>>>> max r1,r1,r2
>>>>> ret
>>>>>
>>>>> ;-)
>>>> For RISCV, it's:
>>>> mymax:
>>>> bge a0,a1,.L4
>>>> mv a0,a1
>>>> .L4:
>>>> ret
>>>>
>>>> So, requires a conditional branch...
>>>> But, for floating point (if supported), there is the 'fmax' instruction.
>>> <
>>> Why is there not an IMAX instruction in every modern ISA ??
>> because you don't need it when you have "?:".
>>
>> gtr(b0, b1), pick(b0, b1, b2), retn(b0);
>>
>> one bundle, one cycle.
> <
> and at least 4 times as much transport energy.

I guess you are adding three for the pick. Mill doesn't - pick is all
done in the namer; no data is moved at all.

As for the rest, the compare requires both the values going from
wherever to the ALU, whether it's an op or folded into a branch, so that
move cost should be the same for all ISAs. The Mill retn also does not
move anything - again, it's all in the namer. So the total actual moves
is two, which makes it hard to use four times anything :-)

Perhaps you haven't noticed me saying: *the belt is not physically a
shift register*.

Re: RISC-V vs. Aarch64

<sr114i$1qc$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22736&group=comp.arch#22736

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-dead-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Tue, 4 Jan 2022 08:39:46 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sr114i$1qc$1@newsreader4.netcologne.de>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
<sr0vhm$c4u$1@dont-email.me>
Injection-Date: Tue, 4 Jan 2022 08:39:46 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-dead-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:dead:0:7285:c2ff:fe6c:992d";
logging-data="1868"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 4 Jan 2022 08:39 UTC

Ivan Godard <ivan@millcomputing.com> schrieb:

> Perhaps you haven't noticed me saying: *the belt is not physically a
> shift register*.

It's usually implemented as a circular buffer, correct?

Re: RISC-V vs. Aarch64

<sr1dca$70e$1@dont-email.me>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22739&group=comp.arch#22739

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Tue, 4 Jan 2022 04:08:41 -0800
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <sr1dca$70e$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
<sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 4 Jan 2022 12:08:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="de80ed7290c9d686a431169b61ac8f21";
logging-data="7182"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+W0iqptxxvcvVOuYjoxJ++"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:/hDPIDaYOIDCXphSspZj65vhPr8=
In-Reply-To: <sr114i$1qc$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Ivan Godard - Tue, 4 Jan 2022 12:08 UTC

On 1/4/2022 12:39 AM, Thomas Koenig wrote:
> Ivan Godard <ivan@millcomputing.com> schrieb:
>
>> Perhaps you haven't noticed me saying: *the belt is not physically a
>> shift register*.
>
> It's usually implemented as a circular buffer, correct?

Not at all.

Computed values are left where they were produced - FU's output latches,
for example - just as in a forwarding bypass network. Move only happens
if the location is needed by some other computation, and then only to an
adjacent location, also on the bypass network - which the way issue
happens guarantees is free.

All the belt advance does is migrate the program-to-physical mapping -
the name - to a new physical location. The cost is the same as when an
OOO migrates the l2p mapping of a logical register to a different
physical register when the logical register is reused by the program.

The cost of the remap is the same as in an x86, or my66 or whatever -
the number of bits involved is determined by the number of physical
locations that the hardware maps to. That's around 40 for a mid-range
Silver, several hundred for typical OOO.

However, an OOO does have a physical move to the architected register
eventually. Mill has no such move, because the whole mapping is passed
through call and return which includes all traps and interrupts. That
state does eventually get saved, lazily, when task switch happens,
whereas an OOO does it on every writeback.

Re: RISC-V vs. Aarch64

<kM%AJ.186634$np6.183460@fx46.iad>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22748&group=comp.arch#22748

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx46.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me> <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com> <sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com> <sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de> <sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org> <077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <squhht$79u$1@dont-email.me> <bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com> <sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de> <sr1dca$70e$1@dont-email.me>
In-Reply-To: <sr1dca$70e$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 25
Message-ID: <kM%AJ.186634$np6.183460@fx46.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 04 Jan 2022 17:49:04 UTC
Date: Tue, 04 Jan 2022 12:48:56 -0500
X-Received-Bytes: 2538
 by: EricP - Tue, 4 Jan 2022 17:48 UTC

Ivan Godard wrote:
> On 1/4/2022 12:39 AM, Thomas Koenig wrote:
>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>
>>> Perhaps you haven't noticed me saying: *the belt is not physically a
>>> shift register*.
>>
>> It's usually implemented as a circular buffer, correct?
>
> Not at all.
>
> Computed values are left where they were produced - FU's output latches,
> for example - just as in a forwarding bypass network. Move only happens
> if the location is needed by some other computation, and then only to an
> adjacent location, also on the bypass network - which the way issue
> happens guarantees is free.

Moving results out of the way is what makes this work.
If you only have one adder FU and you get a bunch of add instructions
in a row, then you need to stash older results in other registers
and this becomes the equivalent of a delayed register writeback.
But I think you need a crossbar to accomplish this while a
traditional approach just needs some small number of result buses.

Re: RISC-V vs. Aarch64

<ba1aba17-4081-485d-b890-963e52b12a1cn@googlegroups.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22751&group=comp.arch#22751

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:1499:: with SMTP id l25mr45525099qtj.476.1641321307248;
Tue, 04 Jan 2022 10:35:07 -0800 (PST)
X-Received: by 2002:a05:6808:198a:: with SMTP id bj10mr39175613oib.37.1641321307013;
Tue, 04 Jan 2022 10:35:07 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 4 Jan 2022 10:35:06 -0800 (PST)
In-Reply-To: <kM%AJ.186634$np6.183460@fx46.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4ce0:ecdb:e44f:1566;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4ce0:ecdb:e44f:1566
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me>
<RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me>
<sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me>
<sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com> <sr0vhm$c4u$1@dont-email.me>
<sr114i$1qc$1@newsreader4.netcologne.de> <sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ba1aba17-4081-485d-b890-963e52b12a1cn@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 04 Jan 2022 18:35:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 39
 by: MitchAlsup - Tue, 4 Jan 2022 18:35 UTC

On Tuesday, January 4, 2022 at 11:49:07 AM UTC-6, EricP wrote:
> Ivan Godard wrote:
> > On 1/4/2022 12:39 AM, Thomas Koenig wrote:
> >> Ivan Godard <iv...@millcomputing.com> schrieb:
> >>
> >>> Perhaps you haven't noticed me saying: *the belt is not physically a
> >>> shift register*.
> >>
> >> It's usually implemented as a circular buffer, correct?
> >
> > Not at all.
> >
> > Computed values are left where they were produced - FU's output latches,
> > for example - just as in a forwarding bypass network. Move only happens
> > if the location is needed by some other computation, and then only to an
> > adjacent location, also on the bypass network - which the way issue
> > happens guarantees is free.
<
> Moving results out of the way is what makes this work.
> If you only have one adder FU and you get a bunch of add instructions
> in a row, then you need to stash older results in other registers
<
Using the definition that a register is a name SW can use to reference
a value, you don't need registers per seé, you just need containers.
Luke's scoreboard design used several such containers at the end of
each calculation unit so the unit can continue to plow through calculations
while result deliver was held up by WAR, and WAW hazards and maybe
result bus scheduling.
<
> and this becomes the equivalent of a delayed register writeback.
> But I think you need a crossbar to accomplish this while a
> traditional approach just needs some small number of result buses.
<
Thing is that is you broadcast the result as soon as it is calculated
(even if you can't write it into the RF) you make up for most of the
lack of resources (busses) performance hit.

Re: RISC-V vs. Aarch64

<_a1BJ.105811$_Y5.6541@fx29.iad>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22752&group=comp.arch#22752

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx29.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me> <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com> <sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com> <sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de> <sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org> <077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <squhht$79u$1@dont-email.me> <bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com> <sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de> <sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad> <ba1aba17-4081-485d-b890-963e52b12a1cn@googlegroups.com>
In-Reply-To: <ba1aba17-4081-485d-b890-963e52b12a1cn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 56
Message-ID: <_a1BJ.105811$_Y5.6541@fx29.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 04 Jan 2022 19:25:46 UTC
Date: Tue, 04 Jan 2022 14:25:29 -0500
X-Received-Bytes: 4211
 by: EricP - Tue, 4 Jan 2022 19:25 UTC

MitchAlsup wrote:
> On Tuesday, January 4, 2022 at 11:49:07 AM UTC-6, EricP wrote:
>> Ivan Godard wrote:
>>> On 1/4/2022 12:39 AM, Thomas Koenig wrote:
>>>> Ivan Godard <iv...@millcomputing.com> schrieb:
>>>>
>>>>> Perhaps you haven't noticed me saying: *the belt is not physically a
>>>>> shift register*.
>>>> It's usually implemented as a circular buffer, correct?
>>> Not at all.
>>>
>>> Computed values are left where they were produced - FU's output latches,
>>> for example - just as in a forwarding bypass network. Move only happens
>>> if the location is needed by some other computation, and then only to an
>>> adjacent location, also on the bypass network - which the way issue
>>> happens guarantees is free.
> <
>> Moving results out of the way is what makes this work.
>> If you only have one adder FU and you get a bunch of add instructions
>> in a row, then you need to stash older results in other registers
> <
> Using the definition that a register is a name SW can use to reference
> a value, you don't need registers per seé, you just need containers.
> Luke's scoreboard design used several such containers at the end of
> each calculation unit so the unit can continue to plow through calculations
> while result deliver was held up by WAR, and WAW hazards and maybe
> result bus scheduling.

I mean register as in latch or flip-flop attached to the output of an FU,
not register as in register file.

I had something similar in my uArch. Some FU can be pipelined or otherwise
take multiple clocks to do their thing. For these "expensive" units I
did not want to have to stall the launch of the next calculation just
because a result register (latches) was full waiting to write back the
prior result on one of the two result buses.
Some units had a second result register to buffer 1 result
as it only cost 1 set of latches and tri-state buffers.

> <
>> and this becomes the equivalent of a delayed register writeback.
>> But I think you need a crossbar to accomplish this while a
>> traditional approach just needs some small number of result buses.
> <
> Thing is that is you broadcast the result as soon as it is calculated
> (even if you can't write it into the RF) you make up for most of the
> lack of resources (busses) performance hit.

Remember, one of my goals was to be a resource frugal OoO.
In my case the two broadcast/forwarding buses also wrote to RF ports
and a round robin arbiter selects two of multiple bus access bidders
each clock to broadcast/write results.

Re: RISC-V vs. Aarch64

<c3407a90-cc5f-46e2-be9a-06ae96dbe1d0n@googlegroups.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22753&group=comp.arch#22753

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:e306:: with SMTP id y6mr36653756qki.458.1641331147610; Tue, 04 Jan 2022 13:19:07 -0800 (PST)
X-Received: by 2002:a05:6808:1914:: with SMTP id bf20mr201338oib.7.1641331147283; Tue, 04 Jan 2022 13:19:07 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 4 Jan 2022 13:19:07 -0800 (PST)
In-Reply-To: <_a1BJ.105811$_Y5.6541@fx29.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4ce0:ecdb:e44f:1566; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4ce0:ecdb:e44f:1566
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me> <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com> <sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com> <sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de> <sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org> <077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <squhht$79u$1@dont-email.me> <bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com> <sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de> <sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad> <ba1aba17-4081-485d-b890-963e52b12a1cn@googlegroups.com> <_a1BJ.105811$_Y5.6541@fx29.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c3407a90-cc5f-46e2-be9a-06ae96dbe1d0n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 04 Jan 2022 21:19:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 87
 by: MitchAlsup - Tue, 4 Jan 2022 21:19 UTC

On Tuesday, January 4, 2022 at 1:25:49 PM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > On Tuesday, January 4, 2022 at 11:49:07 AM UTC-6, EricP wrote:
> >> Ivan Godard wrote:
> >>> On 1/4/2022 12:39 AM, Thomas Koenig wrote:
> >>>> Ivan Godard <iv...@millcomputing.com> schrieb:
> >>>>
> >>>>> Perhaps you haven't noticed me saying: *the belt is not physically a
> >>>>> shift register*.
> >>>> It's usually implemented as a circular buffer, correct?
> >>> Not at all.
> >>>
> >>> Computed values are left where they were produced - FU's output latches,
> >>> for example - just as in a forwarding bypass network. Move only happens
> >>> if the location is needed by some other computation, and then only to an
> >>> adjacent location, also on the bypass network - which the way issue
> >>> happens guarantees is free.
> > <
> >> Moving results out of the way is what makes this work.
> >> If you only have one adder FU and you get a bunch of add instructions
> >> in a row, then you need to stash older results in other registers
> > <
> > Using the definition that a register is a name SW can use to reference
> > a value, you don't need registers per seé, you just need containers.
> > Luke's scoreboard design used several such containers at the end of
> > each calculation unit so the unit can continue to plow through calculations
> > while result deliver was held up by WAR, and WAW hazards and maybe
> > result bus scheduling.
<
> I mean register as in latch or flip-flop attached to the output of an FU,
> not register as in register file.
<
Just making it clear to everyone. But perhaps you, too, can take on the word
"flip-flop" when you mean a HW container that does not necessarily have a SW
name. Thus, after trying to explain to SW+HW audiences for a decade that
the HW concept of a "register" was significantly different than the SW concept,
The only sane point of view when working in both domains was to use a new
word for one of the domains. Since HW is the smaller domain, I chose a different
word for the HW concept. This was easy because HW already had a name (flip-flop)
that everyone in HW understood.
>
> I had something similar in my uArch. Some FU can be pipelined or otherwise
> take multiple clocks to do their thing. For these "expensive" units I
> did not want to have to stall the launch of the next calculation just
> because a result register (latches) was full waiting to write back the
> prior result on one of the two result buses.
<
We did this in the FADD unit when doing FDIV instruction in Mc 88100.
The first stage remained "unBusy" even while an FDIV was in progress.
<
> Some units had a second result register to buffer 1 result
> as it only cost 1 set of latches and tri-state buffers.
<
I have not needed this as I either did strict pipelines or fully resourced
OoO pipelines. Luke, however, did. And, here, I think scoreboards work
better than reservation stations.
> > <
> >> and this becomes the equivalent of a delayed register writeback.
> >> But I think you need a crossbar to accomplish this while a
> >> traditional approach just needs some small number of result buses.
> > <
> > Thing is that is you broadcast the result as soon as it is calculated
> > (even if you can't write it into the RF) you make up for most of the
> > lack of resources (busses) performance hit.
<
> Remember, one of my goals was to be a resource frugal OoO.
> In my case the two broadcast/forwarding buses also wrote to RF ports
> and a round robin arbiter selects two of multiple bus access bidders
> each clock to broadcast/write results.
<
Bet that caused some interesting complexity in the instruction scheduler.

Re: RISC-V vs. Aarch64

<sr2gf6$64u$1@dont-email.me>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22758&group=comp.arch#22758

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Tue, 4 Jan 2022 14:07:34 -0800
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <sr2gf6$64u$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
<sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de>
<sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 4 Jan 2022 22:07:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="de80ed7290c9d686a431169b61ac8f21";
logging-data="6302"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+8hYf2oAowCnphuPo3pe09"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:JF/+/cjF5ZbLRphK2lpwNf0rLn0=
In-Reply-To: <kM%AJ.186634$np6.183460@fx46.iad>
Content-Language: en-US
 by: Ivan Godard - Tue, 4 Jan 2022 22:07 UTC

On 1/4/2022 9:48 AM, EricP wrote:
> Ivan Godard wrote:
>> On 1/4/2022 12:39 AM, Thomas Koenig wrote:
>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>
>>>> Perhaps you haven't noticed me saying: *the belt is not physically a
>>>> shift register*.
>>>
>>> It's usually implemented as a circular buffer, correct?
>>
>> Not at all.
>>
>> Computed values are left where they were produced - FU's output
>> latches, for example - just as in a forwarding bypass network. Move
>> only happens if the location is needed by some other computation, and
>> then only to an adjacent location, also on the bypass network - which
>> the way issue happens guarantees is free.
>
> Moving results out of the way is what makes this work.
> If you only have one adder FU and you get a bunch of add instructions
> in a row, then you need to stash older results in other registers
> and this becomes the equivalent of a delayed register writeback.
> But I think you need a crossbar to accomplish this while a
> traditional approach just needs some small number of result buses.
>
>

Still not :-)

Our FU slots can (and typically do) support operations of different
natural latencies (pipe lengths). Each slot can accept one op per cycle,
so if the latencies differ you can get more than one result retiring in
the same cycle. Consequently the FUs have one result FF per supported
latency.

If a op of latency N retires in cycle C to FF#N, necessarily the
following cycle C+1 the FF#N+1 is free (think about it). Consequently,
the FU's FFs are daisy chained so that each cycle FF#N is moved to
FF#N+1 and every result always is retiring to a known free FF; the set
of FFs are right next to each other so the move is trivial.

That gets rid of all moves except the move from FF#last, which has no
last+1. If that FF was originally filled and is still live (it may have
been daisy chained enough to have fallen off the belt) then it gets
moved to a dynamically allocated skid FF in the spiller. There can be at
most one such move per slot. There can be at most <belt size> skid FFs
needed for this.

This move to the spiller is the only point which resembles a writeback
stage in a genreg machine. In limited measurement of limited test code,
we see ~25% of all drops reach the spiller and actually cost power
(Silver); fewer in configs with short belts and more in ones with long
belts. I don't know if OOOs skip writeback of superseded results, but
the fanout of the mux trees to the spiller (16 in Silver) is much less
than that to a OOO regfile will hundreds more registers.

The idea of a "last use" bit in the encoding has come up several times
here. Falling off the belt is an implicit last use, so perhaps an
explicit bit could give the same saving in writeback that the Mill displays.

Re: RISC-V vs. Aarch64

<c08a77cb-0c7a-43b2-a488-52dec2961665n@googlegroups.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22763&group=comp.arch#22763

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:8d86:: with SMTP id p128mr38710056qkd.706.1641396863406;
Wed, 05 Jan 2022 07:34:23 -0800 (PST)
X-Received: by 2002:a05:6808:1248:: with SMTP id o8mr2977725oiv.157.1641396863195;
Wed, 05 Jan 2022 07:34:23 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 5 Jan 2022 07:34:23 -0800 (PST)
In-Reply-To: <077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:11ce:8ac:b19e:7500;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:11ce:8ac:b19e:7500
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me>
<RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me>
<sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me>
<sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org> <077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c08a77cb-0c7a-43b2-a488-52dec2961665n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 05 Jan 2022 15:34:23 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 25
 by: Quadibloc - Wed, 5 Jan 2022 15:34 UTC

On Sunday, January 2, 2022 at 3:38:00 PM UTC-7, MitchAlsup wrote:

> Why is there not an IMAX instruction in every modern ISA ??

Surely if one is going to include video decompression assist
instructions, they wouldn't be limited to just one aspect ratio...

But seriously, I do see where you're going. Picking the larger
of two numbers in a single instruction avoids a _branch_, and
branches are positively horrible in modern implementations.

Usually, though, you don't just want the larger of the two, you
want information about which one was larger, and that makes
the instruction much more complicated and/or specialized.

*And there are so many other cases like this.*

Instruction predication is an example of addressing this issue
from the other end; instead of having no-branch instructions that
do stuff, it tries to take the sting out of branches. Obviously, though,
it's somewhat limited, it works best for avoiding branches around
very short sequences of code.

Is there another way to take the sting out of branches...

John Savard

Re: RISC-V vs. Aarch64

<30ce6e53-986e-4772-931f-a22a21f4f1a1n@googlegroups.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22764&group=comp.arch#22764

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5dc1:: with SMTP id m1mr49655899qvh.26.1641397419558;
Wed, 05 Jan 2022 07:43:39 -0800 (PST)
X-Received: by 2002:a9d:1b0f:: with SMTP id l15mr37726574otl.38.1641397418919;
Wed, 05 Jan 2022 07:43:38 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 5 Jan 2022 07:43:38 -0800 (PST)
In-Reply-To: <c08a77cb-0c7a-43b2-a488-52dec2961665n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:11ce:8ac:b19e:7500;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:11ce:8ac:b19e:7500
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me>
<RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me>
<sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me>
<sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <c08a77cb-0c7a-43b2-a488-52dec2961665n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <30ce6e53-986e-4772-931f-a22a21f4f1a1n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 05 Jan 2022 15:43:39 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 40
 by: Quadibloc - Wed, 5 Jan 2022 15:43 UTC

On Wednesday, January 5, 2022 at 8:34:24 AM UTC-7, Quadibloc wrote:
> Is there another way to take the sting out of branches...

Clearly,

set flag on condition
....
branch on flag

doesn't do it. It doesn't give the CPU enough information.

Branches are bad because modern computers are heavily
pipelined. So if the branch is taken - or not taken, if the
branch is assumed to be taken - the pipeline has to be
flushed; what a waste!

Let's take the case of a 1 to n loop. You know well ahead
of time if the branch is to be taken or not.

So what you *really* want would be:

branch, when there is a branch, on condition
....
this is the conditional branch previously mentioned, branch from here

or it could be...

conditional branch with variable number of delay slots
.....
marker that says this is the end of the delay slots

As long as enough code is put in the ... part (including
no branches at all, of course)

....and one correctly ensures that the required information
is saved and restored in the case of an interrupt

it might actually work.

John Savard

Re: RISC-V vs. Aarch64

<90789e26-73c3-4aeb-a6f4-dadff118f573n@googlegroups.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22767&group=comp.arch#22767

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4652:: with SMTP id f18mr49293143qto.381.1641397768549;
Wed, 05 Jan 2022 07:49:28 -0800 (PST)
X-Received: by 2002:a9d:618f:: with SMTP id g15mr40367384otk.129.1641397768339;
Wed, 05 Jan 2022 07:49:28 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 5 Jan 2022 07:49:28 -0800 (PST)
In-Reply-To: <30ce6e53-986e-4772-931f-a22a21f4f1a1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:11ce:8ac:b19e:7500;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:11ce:8ac:b19e:7500
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me>
<RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me>
<sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me>
<sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <c08a77cb-0c7a-43b2-a488-52dec2961665n@googlegroups.com>
<30ce6e53-986e-4772-931f-a22a21f4f1a1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <90789e26-73c3-4aeb-a6f4-dadff118f573n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 05 Jan 2022 15:49:28 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 18
 by: Quadibloc - Wed, 5 Jan 2022 15:49 UTC

On Wednesday, January 5, 2022 at 8:43:40 AM UTC-7, Quadibloc wrote:
> On Wednesday, January 5, 2022 at 8:34:24 AM UTC-7, Quadibloc wrote:
>
> > Is there another way to take the sting out of branches...
> Clearly,
>
> set flag on condition
> ...
> branch on flag
>
> doesn't do it. It doesn't give the CPU enough information.

Upon reflection, it _does_ give the CPU enough information, as
it does exactly what the other solutions suggested afterwards
do. The CPU can very quickly - when decoding the instruction -
check if the flag is set, do direct the fetching of subsequent
instructions.

John Savard

Re: RISC-V vs. Aarch64

<cf86c2a1-31ab-4387-a08a-300f021136c3n@googlegroups.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22768&group=comp.arch#22768

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4446:: with SMTP id w6mr39138293qkp.631.1641399184671;
Wed, 05 Jan 2022 08:13:04 -0800 (PST)
X-Received: by 2002:a4a:5a43:: with SMTP id v64mr35156804ooa.26.1641399184433;
Wed, 05 Jan 2022 08:13:04 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 5 Jan 2022 08:13:04 -0800 (PST)
In-Reply-To: <90789e26-73c3-4aeb-a6f4-dadff118f573n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:11ce:8ac:b19e:7500;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:11ce:8ac:b19e:7500
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me>
<RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me>
<sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me>
<sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <c08a77cb-0c7a-43b2-a488-52dec2961665n@googlegroups.com>
<30ce6e53-986e-4772-931f-a22a21f4f1a1n@googlegroups.com> <90789e26-73c3-4aeb-a6f4-dadff118f573n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cf86c2a1-31ab-4387-a08a-300f021136c3n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 05 Jan 2022 16:13:04 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 45
 by: Quadibloc - Wed, 5 Jan 2022 16:13 UTC

On Wednesday, January 5, 2022 at 8:49:29 AM UTC-7, Quadibloc wrote:
> On Wednesday, January 5, 2022 at 8:43:40 AM UTC-7, Quadibloc wrote:
> > On Wednesday, January 5, 2022 at 8:34:24 AM UTC-7, Quadibloc wrote:
> >
> > > Is there another way to take the sting out of branches...
> > Clearly,
> >
> > set flag on condition
> > ...
> > branch on flag
> >
> > doesn't do it. It doesn't give the CPU enough information.
> Upon reflection, it _does_ give the CPU enough information, as
> it does exactly what the other solutions suggested afterwards
> do. The CPU can very quickly - when decoding the instruction -
> check if the flag is set, do direct the fetching of subsequent
> instructions.

Further reflection led me to think...

since a regular 'branch on condition' instruction just tests some condition
code bits, which were set by a previous instruction,

then as long as it's clear from instruction decoding which instructions
can set the condition codes - and there's a C bit in the instruction to turn
setting them off, so lots of delay can be put between the instruction that
set the codes and the branch...

then why should branches be a problem at all any longer?

And _then_ I realized what I was missing. Sure, this deals with the problem
of flushing the pipeline, because the decode stage is soon enough.

But before an instruction is decoded, it has to be *fetched* from memory,
and DRAM is really, really slow. So since branches also affect what you need
to fetch, and waiting for DRAM causes even more delay than flushing the
pipeline... branches are unavoidably a pain. Of course, stuff like SMT,
giving the computer something else useful to do, helps to cover that up...

Thus, the _other_ idea crossing my mind might be the useful one; something
which takes a short code sequence, and says, 'this sequence has branching in
it, keep it in the instruction cache'... while that only applies to some cases,
it would handle the pessimal case of code with lots of branching after every
few instructions...

John Savard

Re: RISC-V vs. Aarch64

<7DpBJ.254731$3q9.63673@fx47.iad>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22773&group=comp.arch#22773

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!ecngs!feeder2.ecngs.de!178.20.174.213.MISMATCH!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx47.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me> <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com> <sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com> <sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de> <sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org> <077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <squhht$79u$1@dont-email.me> <bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com> <sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de> <sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad> <sr2gf6$64u$1@dont-email.me>
In-Reply-To: <sr2gf6$64u$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 99
Message-ID: <7DpBJ.254731$3q9.63673@fx47.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 05 Jan 2022 23:14:11 UTC
Date: Wed, 05 Jan 2022 18:13:56 -0500
X-Received-Bytes: 5787
 by: EricP - Wed, 5 Jan 2022 23:13 UTC

Ivan Godard wrote:
> On 1/4/2022 9:48 AM, EricP wrote:
>> Ivan Godard wrote:
>>> On 1/4/2022 12:39 AM, Thomas Koenig wrote:
>>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>>
>>>>> Perhaps you haven't noticed me saying: *the belt is not physically a
>>>>> shift register*.
>>>>
>>>> It's usually implemented as a circular buffer, correct?
>>>
>>> Not at all.
>>>
>>> Computed values are left where they were produced - FU's output
>>> latches, for example - just as in a forwarding bypass network. Move
>>> only happens if the location is needed by some other computation, and
>>> then only to an adjacent location, also on the bypass network - which
>>> the way issue happens guarantees is free.
>>
>> Moving results out of the way is what makes this work.
>> If you only have one adder FU and you get a bunch of add instructions
>> in a row, then you need to stash older results in other registers
>> and this becomes the equivalent of a delayed register writeback.
>> But I think you need a crossbar to accomplish this while a
>> traditional approach just needs some small number of result buses.
>>
>>
>
> Still not :-)
>
> Our FU slots can (and typically do) support operations of different
> natural latencies (pipe lengths). Each slot can accept one op per cycle,
> so if the latencies differ you can get more than one result retiring in
> the same cycle. Consequently the FUs have one result FF per supported
> latency.
>
> If a op of latency N retires in cycle C to FF#N, necessarily the
> following cycle C+1 the FF#N+1 is free (think about it). Consequently,
> the FU's FFs are daisy chained so that each cycle FF#N is moved to
> FF#N+1 and every result always is retiring to a known free FF; the set
> of FFs are right next to each other so the move is trivial.

This sounds like it has a belt's worth of FF for each FU.
And some of them are shift registers?
I'm a bit confused.

I envisioned this having 1 belt of FF values and two crossbars,
one crossbar to route any belt FF to any FU operands,
one to route any FU result to the proper empty FF for its latency.

It is the logical to physical map that actually shifts.
And a scoreboard tracks when FF operands are ready
and are written into their correct physical slot.

Belt
FF ---->Xbar<--Map
^ | |
| v |
Xbar<---FU v
<------FU

> That gets rid of all moves except the move from FF#last, which has no
> last+1. If that FF was originally filled and is still live (it may have
> been daisy chained enough to have fallen off the belt) then it gets
> moved to a dynamically allocated skid FF in the spiller. There can be at
> most one such move per slot. There can be at most <belt size> skid FFs
> needed for this.
>
> This move to the spiller is the only point which resembles a writeback
> stage in a genreg machine. In limited measurement of limited test code,
> we see ~25% of all drops reach the spiller and actually cost power
> (Silver); fewer in configs with short belts and more in ones with long
> belts. I don't know if OOOs skip writeback of superseded results, but
> the fanout of the mux trees to the spiller (16 in Silver) is much less
> than that to a OOO regfile will hundreds more registers.
>
> The idea of a "last use" bit in the encoding has come up several times
> here. Falling off the belt is an implicit last use, so perhaps an
> explicit bit could give the same saving in writeback that the Mill
> displays.

If the belt was a compacting FIFO which allows slots to be deleted
from the interior then a "last use" bit lets it delete slots
without pushing items off the far end.

Here A,B are on the belt and C,D,E are appended.
Then C is dropped from the middle and F appended.

E D C->B A => F->E D C B A => ->F E D B A
^
last use

And its just the logical to physical map that is modified to drop C.

As you have pointed out before, since many/most results are referenced
only once this would allow more intermediate results to be held on the
belt longer resulting in fewer spills to scratch.

Re: RISC-V vs. Aarch64

<5b2c3f44-2a81-4ded-9534-086c8583dfafn@googlegroups.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22774&group=comp.arch#22774

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5b01:: with SMTP id m1mr49455619qtw.313.1641426266125;
Wed, 05 Jan 2022 15:44:26 -0800 (PST)
X-Received: by 2002:a4a:5a43:: with SMTP id v64mr36094353ooa.26.1641426265919;
Wed, 05 Jan 2022 15:44:25 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 5 Jan 2022 15:44:25 -0800 (PST)
In-Reply-To: <7DpBJ.254731$3q9.63673@fx47.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7595:75c3:454b:35cd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7595:75c3:454b:35cd
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me>
<RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me>
<sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me>
<sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com> <squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com> <sr0vhm$c4u$1@dont-email.me>
<sr114i$1qc$1@newsreader4.netcologne.de> <sr1dca$70e$1@dont-email.me>
<kM%AJ.186634$np6.183460@fx46.iad> <sr2gf6$64u$1@dont-email.me> <7DpBJ.254731$3q9.63673@fx47.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5b2c3f44-2a81-4ded-9534-086c8583dfafn@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 05 Jan 2022 23:44:26 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 34
 by: MitchAlsup - Wed, 5 Jan 2022 23:44 UTC

On Wednesday, January 5, 2022 at 5:14:16 PM UTC-6, EricP wrote:
> Ivan Godard wrote:
> > On 1/4/2022 9:48 AM, EricP wrote:
> >> Ivan Godard wrote:
> >>> On 1/4/2022 12:39 AM, Thomas Koenig wrote:

> > The idea of a "last use" bit in the encoding has come up several times
> > here. Falling off the belt is an implicit last use, so perhaps an
> > explicit bit could give the same saving in writeback that the Mill
> > displays.
<
One can use register-write-elision to perform essentially the same thing
(on a machine with restrictions on the write ports.) Once a write passes
commit and a younger write is past the commit point, the older write can
be discarded instead of being written into the RF (after obeying WARs)
<
> If the belt was a compacting FIFO which allows slots to be deleted
> from the interior then a "last use" bit lets it delete slots
> without pushing items off the far end.
<
Good observation. If the belt is sufficiently long this would have negligible
utility, but if the belt is short enough it may be of value.
>
> Here A,B are on the belt and C,D,E are appended.
> Then C is dropped from the middle and F appended.
>
> E D C->B A => F->E D C B A => ->F E D B A
> ^
> last use
>
> And its just the logical to physical map that is modified to drop C.
>
> As you have pointed out before, since many/most results are referenced
> only once this would allow more intermediate results to be held on the
> belt longer resulting in fewer spills to scratch.

Re: RISC-V vs. Aarch64

<sr62tb$u2o$1@dont-email.me>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22775&group=comp.arch#22775

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Wed, 5 Jan 2022 22:40:42 -0800
Organization: A noiseless patient Spider
Lines: 145
Message-ID: <sr62tb$u2o$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<sq5dj1$1q9$1@dont-email.me>
<59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
<sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de>
<sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad>
<sr2gf6$64u$1@dont-email.me> <7DpBJ.254731$3q9.63673@fx47.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 6 Jan 2022 06:40:44 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7d9bb99f1f5660f7a64992046b6aeaff";
logging-data="30808"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184gjthN+Uo/DSEHFxQwrlN"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:dM1pFmy9xWq4oVZESMBM+1qIxoQ=
In-Reply-To: <7DpBJ.254731$3q9.63673@fx47.iad>
Content-Language: en-US
 by: Ivan Godard - Thu, 6 Jan 2022 06:40 UTC

On 1/5/2022 3:13 PM, EricP wrote:
> Ivan Godard wrote:
>> On 1/4/2022 9:48 AM, EricP wrote:
>>> Ivan Godard wrote:
>>>> On 1/4/2022 12:39 AM, Thomas Koenig wrote:
>>>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>>>
>>>>>> Perhaps you haven't noticed me saying: *the belt is not physically a
>>>>>> shift register*.
>>>>>
>>>>> It's usually implemented as a circular buffer, correct?
>>>>
>>>> Not at all.
>>>>
>>>> Computed values are left where they were produced - FU's output
>>>> latches, for example - just as in a forwarding bypass network. Move
>>>> only happens if the location is needed by some other computation,
>>>> and then only to an adjacent location, also on the bypass network -
>>>> which the way issue happens guarantees is free.
>>>
>>> Moving results out of the way is what makes this work.
>>> If you only have one adder FU and you get a bunch of add instructions
>>> in a row, then you need to stash older results in other registers
>>> and this becomes the equivalent of a delayed register writeback.
>>> But I think you need a crossbar to accomplish this while a
>>> traditional approach just needs some small number of result buses.
>>>
>>>
>>
>> Still not :-)
>>
>> Our FU slots can (and typically do) support operations of different
>> natural latencies (pipe lengths). Each slot can accept one op per
>> cycle, so if the latencies differ you can get more than one result
>> retiring in the same cycle. Consequently the FUs have one result FF
>> per supported latency.
>>
>> If a op of latency N retires in cycle C to FF#N, necessarily the
>> following cycle C+1 the FF#N+1 is free (think about it). Consequently,
>> the FU's FFs are daisy chained so that each cycle FF#N is moved to
>> FF#N+1  and every result always is retiring to a known free FF; the
>> set of FFs are right next to each other so the move is trivial.
>
> This sounds like it has a belt's worth of FF for each FU.
> And some of them are shift registers?
> I'm a bit confused.

There can be (and usually are) several FUs per slot, forming in effect a
"superFU". There is only one set of output FFs per slot, one per
latency. These are daisy chained. I suppose that you can think of the
output FF daisy as being a shift register, and it could be done that
way, but it also could be done by simply rotating which FF is considered
which latency, thereby replacing a physical data move with a
result-to-FF fanout. That's a HW design choice; IANAHWG.

> I envisioned this having 1 belt of FF values and two crossbars,
> one crossbar to route any belt FF to any FU operands,
> one to route any FU result to the proper empty FF for its latency.

Only one crossbar, which routes belt FF to slot operands. The slots know
which op has which latency and so directly connect the compute logic to
the right output FF. Say a slot has a 1-cycle adder and a 3-cycle
multiplier. The adder will be connected directly to the lat1 FF, the
mult to the lat3 FF, and nothing will go to the lat2 FF; this is not a
crossbar.

Meanwhile, the lat1 FF daisies to the lat2 FF, the lat2 FF to the lat3,
and the lat3 to the spiller (although the config might put a lat4 or
more in there, depending on the tradeoff between more sources on the
input crossbar vs more value making it all the way op the daisy to press
on the spiller).

> It is the logical to physical map that actually shifts.
> And a scoreboard tracks when FF operands are ready
> and are written into their correct physical slot.
>
>   Belt
>    FF ---->Xbar<--Map
>    ^       |  |
>    |       v  |
>   Xbar<---FU  v
>       <------FU
>
There's no scoreboard, because Mill is statically scheduled. The FU
result either goes direct to the right FF (which is actually built into
that FU) or fans out to it 1->N, so there's no second xbar. If you did
put in a scoreboard then you could use dynamic scheduling.

However, dynamic requires another xbar into the scoreboard, and xbars
get out of hand quickly with increasing concurrency; the practical limit
seems to be around 8-way issue even with ridiculously long pipes. Mill
is intended for use up into concurrencies in the 20-30 range, so static
seems the only way to go, with other devices to avoid costs like cache
misses.

>> That gets rid of all moves except the move from FF#last, which has no
>> last+1. If that FF was originally filled and is still live (it may
>> have been daisy chained enough to have fallen off the belt) then it
>> gets moved to a dynamically allocated skid FF in the spiller. There
>> can be at most one such move per slot. There can be at most <belt
>> size> skid FFs needed for this.
>>
>> This move to the spiller is the only point which resembles a writeback
>> stage in a genreg machine. In limited measurement of limited test
>> code, we see ~25% of all drops reach the spiller and actually cost
>> power (Silver); fewer in configs with short belts and more in ones
>> with long belts. I don't know if OOOs skip writeback of superseded
>> results, but the fanout of the mux trees to the spiller (16 in Silver)
>> is much less than that to a OOO regfile will hundreds more registers.
>>
>> The idea of a "last use" bit in the encoding has come up several times
>> here. Falling off the belt is an implicit last use, so perhaps an
>> explicit bit could give the same saving in writeback that the Mill
>> displays.
>
> If the belt was a compacting FIFO which allows slots to be deleted
> from the interior then a "last use" bit lets it delete slots
> without pushing items off the far end.
>
> Here A,B are on the belt and C,D,E are appended.
> Then C is dropped from the middle and F appended.
>
>   E D C->B A  =>  F->E D C B A  =>  ->F E D B A
>                          ^
>                       last use
>
> And its just the logical to physical map that is modified to drop C.

That is in effect what the rescue instruction does, although by saying
what is live instead of what is dead (rescue takes a per-position bit
mask argument). This is just a matter of encoding, trading a last-use
bit in each belt argument specifier of every instruction against a
rescue with a bit per position that you only put in if a live value is
about to fall off.

>
> As you have pointed out before, since many/most results are referenced
> only once this would allow more intermediate results to be held on the
> belt longer resulting in fewer spills to scratch.

With the rescue instruction, we wind up spilling to scratch only when
the number of live operands exceeds the size of the belt. This is
equivalent to the forced spills in a genreg when the number of live
operands exceeds to number of architectural registers.

Re: RISC-V vs. Aarch64

<sr8bs4$2i3$1@dont-email.me>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22792&group=comp.arch#22792

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Fri, 7 Jan 2022 03:25:58 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 172
Message-ID: <sr8bs4$2i3$1@dont-email.me>
References: <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me>
<RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me>
<sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me>
<sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org>
<sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me>
<sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
<sr0vhm$c4u$1@dont-email.me>
<sr114i$1qc$1@newsreader4.netcologne.de>
<sr1dca$70e$1@dont-email.me>
<kM%AJ.186634$np6.183460@fx46.iad>
<sr2gf6$64u$1@dont-email.me>
<7DpBJ.254731$3q9.63673@fx47.iad>
<sr62tb$u2o$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jan 2022 03:25:58 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bcc6f48c9724ec7243948bc5d71580f8";
logging-data="2627"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18IbHKRAteWW3sDrNfhSzXd"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:UfCPmYtXl/KWi/omSpeuJWPqJhc=
sha1:hLVmF8HnCQ4aXqabfN3VjaotXuE=
 by: Brett - Fri, 7 Jan 2022 03:25 UTC

Ivan Godard <ivan@millcomputing.com> wrote:
> On 1/5/2022 3:13 PM, EricP wrote:
>> Ivan Godard wrote:
>>> On 1/4/2022 9:48 AM, EricP wrote:
>>>> Ivan Godard wrote:
>>>>> On 1/4/2022 12:39 AM, Thomas Koenig wrote:
>>>>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>>>>
>>>>>>> Perhaps you haven't noticed me saying: *the belt is not physically a
>>>>>>> shift register*.
>>>>>>
>>>>>> It's usually implemented as a circular buffer, correct?
>>>>>
>>>>> Not at all.
>>>>>
>>>>> Computed values are left where they were produced - FU's output
>>>>> latches, for example - just as in a forwarding bypass network. Move
>>>>> only happens if the location is needed by some other computation,
>>>>> and then only to an adjacent location, also on the bypass network -
>>>>> which the way issue happens guarantees is free.
>>>>
>>>> Moving results out of the way is what makes this work.
>>>> If you only have one adder FU and you get a bunch of add instructions
>>>> in a row, then you need to stash older results in other registers
>>>> and this becomes the equivalent of a delayed register writeback.
>>>> But I think you need a crossbar to accomplish this while a
>>>> traditional approach just needs some small number of result buses.
>>>>
>>>>
>>>
>>> Still not :-)
>>>
>>> Our FU slots can (and typically do) support operations of different
>>> natural latencies (pipe lengths). Each slot can accept one op per
>>> cycle, so if the latencies differ you can get more than one result
>>> retiring in the same cycle. Consequently the FUs have one result FF
>>> per supported latency.
>>>
>>> If a op of latency N retires in cycle C to FF#N, necessarily the
>>> following cycle C+1 the FF#N+1 is free (think about it). Consequently,
>>> the FU's FFs are daisy chained so that each cycle FF#N is moved to
>>> FF#N+1  and every result always is retiring to a known free FF; the
>>> set of FFs are right next to each other so the move is trivial.
>>
>> This sounds like it has a belt's worth of FF for each FU.
>> And some of them are shift registers?
>> I'm a bit confused.
>
> There can be (and usually are) several FUs per slot, forming in effect a
> "superFU". There is only one set of output FFs per slot, one per
> latency. These are daisy chained. I suppose that you can think of the
> output FF daisy as being a shift register, and it could be done that
> way, but it also could be done by simply rotating which FF is considered
> which latency, thereby replacing a physical data move with a
> result-to-FF fanout. That's a HW design choice; IANAHWG.
>
>> I envisioned this having 1 belt of FF values and two crossbars,
>> one crossbar to route any belt FF to any FU operands,
>> one to route any FU result to the proper empty FF for its latency.
>
> Only one crossbar, which routes belt FF to slot operands. The slots know
> which op has which latency and so directly connect the compute logic to
> the right output FF. Say a slot has a 1-cycle adder and a 3-cycle
> multiplier. The adder will be connected directly to the lat1 FF, the
> mult to the lat3 FF, and nothing will go to the lat2 FF; this is not a
> crossbar.
>
> Meanwhile, the lat1 FF daisies to the lat2 FF, the lat2 FF to the lat3,
> and the lat3 to the spiller (although the config might put a lat4 or
> more in there, depending on the tradeoff between more sources on the
> input crossbar vs more value making it all the way op the daisy to press
> on the spiller).
>
>> It is the logical to physical map that actually shifts.
>> And a scoreboard tracks when FF operands are ready
>> and are written into their correct physical slot.
>>
>>   Belt
>>    FF ---->Xbar<--Map
>>    ^       |  |
>>    |       v  |
>>   Xbar<---FU  v
>>       <------FU
>>
> There's no scoreboard, because Mill is statically scheduled. The FU
> result either goes direct to the right FF (which is actually built into
> that FU) or fans out to it 1->N, so there's no second xbar. If you did
> put in a scoreboard then you could use dynamic scheduling.
>
> However, dynamic requires another xbar into the scoreboard, and xbars
> get out of hand quickly with increasing concurrency; the practical limit
> seems to be around 8-way issue even with ridiculously long pipes. Mill
> is intended for use up into concurrencies in the 20-30 range, so static
> seems the only way to go, with other devices to avoid costs like cache
> misses.

Apple dominating Spec with the M1 while not having HyperThreading is
something I had though impossible due to dram misses. So a smart enough CPU
can pull of what you claim, which I was polite enough not to mock. Barely,
dodged a bullet there. ;)

I used to think Mill was doomed due to not going OoOE, curing mindset is
hard.
Apple has corrected my mindset and I now believe you can beat OoOE, at far
lower cost and power. Pointing out the Apple M1 crushing x86 is something
you could add to your investor relations. Break the x86 mindset.

But you don’t have a billion dollars of brute force to add those smarts in
every corner to prevent glass jaws. ;(

Your biggest glass jaw may be aliasing in C code forcing crap code
generation, you can add speculation and replay on fault to cure this. But
that may cost more than you can afford on your first designs?

This one change would convince every good programmer that Mill is viable,
and could beat Apple’s 6 wide design. Otherwise you can’t do effective 6
wide as aliasing caps your performance, you have a write every 6
instructions at most.

>>> That gets rid of all moves except the move from FF#last, which has no
>>> last+1. If that FF was originally filled and is still live (it may
>>> have been daisy chained enough to have fallen off the belt) then it
>>> gets moved to a dynamically allocated skid FF in the spiller. There
>>> can be at most one such move per slot. There can be at most <belt
>>>> skid FFs needed for this.
>>>
>>> This move to the spiller is the only point which resembles a writeback
>>> stage in a genreg machine. In limited measurement of limited test
>>> code, we see ~25% of all drops reach the spiller and actually cost
>>> power (Silver); fewer in configs with short belts and more in ones
>>> with long belts. I don't know if OOOs skip writeback of superseded
>>> results, but the fanout of the mux trees to the spiller (16 in Silver)
>>> is much less than that to a OOO regfile will hundreds more registers.
>>>
>>> The idea of a "last use" bit in the encoding has come up several times
>>> here. Falling off the belt is an implicit last use, so perhaps an
>>> explicit bit could give the same saving in writeback that the Mill
>>> displays.
>>
>> If the belt was a compacting FIFO which allows slots to be deleted
>> from the interior then a "last use" bit lets it delete slots
>> without pushing items off the far end.
>>
>> Here A,B are on the belt and C,D,E are appended.
>> Then C is dropped from the middle and F appended.
>>
>>   E D C->B A  =>  F->E D C B A  =>  ->F E D B A
>>                          ^
>>                       last use
>>
>> And its just the logical to physical map that is modified to drop C.
>
> That is in effect what the rescue instruction does, although by saying
> what is live instead of what is dead (rescue takes a per-position bit
> mask argument). This is just a matter of encoding, trading a last-use
> bit in each belt argument specifier of every instruction against a
> rescue with a bit per position that you only put in if a live value is
> about to fall off.
>
>>
>> As you have pointed out before, since many/most results are referenced
>> only once this would allow more intermediate results to be held on the
>> belt longer resulting in fewer spills to scratch.
>
> With the rescue instruction, we wind up spilling to scratch only when
> the number of live operands exceeds the size of the belt. This is
> equivalent to the forced spills in a genreg when the number of live
> operands exceeds to number of architectural registers.
>


Click here to read the complete article
Re: RISC-V vs. Aarch64

<a88da679-061b-4507-879f-8860cacc972cn@googlegroups.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22793&group=comp.arch#22793

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:cab:: with SMTP id s11mr56119847qvs.131.1641526408251;
Thu, 06 Jan 2022 19:33:28 -0800 (PST)
X-Received: by 2002:a4a:be90:: with SMTP id o16mr37508606oop.28.1641526407987;
Thu, 06 Jan 2022 19:33:27 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jan 2022 19:33:27 -0800 (PST)
In-Reply-To: <sr8bs4$2i3$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:19f4:580e:a114:b4b6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:19f4:580e:a114:b4b6
References: <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com> <sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com> <sqpocs$1so3$1@gioia.aioe.org>
<sqpqbm$7qo$1@newsreader4.netcologne.de> <sqq3ce$c4n$2@dont-email.me>
<sqssff$a9j$1@gioia.aioe.org> <077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me> <bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
<sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de>
<sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad>
<sr2gf6$64u$1@dont-email.me> <7DpBJ.254731$3q9.63673@fx47.iad>
<sr62tb$u2o$1@dont-email.me> <sr8bs4$2i3$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a88da679-061b-4507-879f-8860cacc972cn@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 07 Jan 2022 03:33:28 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 47
 by: MitchAlsup - Fri, 7 Jan 2022 03:33 UTC

On Thursday, January 6, 2022 at 9:26:01 PM UTC-6, gg...@yahoo.com wrote:
> Ivan Godard <iv...@millcomputing.com> wrote:

> Apple dominating Spec with the M1 while not having HyperThreading is
> something I had though impossible due to dram misses.
<
So, you bought into the <ahem> hype !
<
> So a smart enough CPU
> can pull of what you claim, which I was polite enough not to mock. Barely,
> dodged a bullet there. ;)
>
> I used to think Mill was doomed due to not going OoOE, curing mindset is
> hard.
> Apple has corrected my mindset and I now believe you can beat OoOE, at far
> lower cost and power. Pointing out the Apple M1 crushing x86 is something
> you could add to your investor relations. Break the x86 mindset.
>
> But you don’t have a billion dollars of brute force to add those smarts in
> every corner to prevent glass jaws. ;(
>
> Your biggest glass jaw may be aliasing in C code forcing crap code
> generation, you can add speculation and replay on fault to cure this. But
> that may cost more than you can afford on your first designs?
<
The thing I don't understand about Mill (of whom I am a fan) is what kind
of code gets produced when 7 out of 8 loops run alias free, but the 8th
iteration does something different (alias, miss, TLB,...) than the other loops.
<
>
> This one change would convince every good programmer that Mill is viable,
> and could beat Apple’s 6 wide design. Otherwise you can’t do effective 6
> wide as aliasing caps your performance, you have a write every 6
> instructions at most.
<
Performance measured by wall clock time sells itself.
<
>

Re: RISC-V vs. Aarch64

<sr8fgk$k6u$1@dont-email.me>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22794&group=comp.arch#22794

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 6 Jan 2022 20:28:03 -0800
Organization: A noiseless patient Spider
Lines: 132
Message-ID: <sr8fgk$k6u$1@dont-email.me>
References: <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad>
<sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de>
<sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
<sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de>
<sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad>
<sr2gf6$64u$1@dont-email.me> <7DpBJ.254731$3q9.63673@fx47.iad>
<sr62tb$u2o$1@dont-email.me> <sr8bs4$2i3$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jan 2022 04:28:04 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="305e35c08e81839d6e2c9ca9371b8a28";
logging-data="20702"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/F3wWcezav3e2274LZcJcJ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:CieMp3G2YAKe8ohCW5ReOiaUbpo=
In-Reply-To: <sr8bs4$2i3$1@dont-email.me>
Content-Language: en-US
 by: Ivan Godard - Fri, 7 Jan 2022 04:28 UTC

On 1/6/2022 7:25 PM, Brett wrote:
> Ivan Godard <ivan@millcomputing.com> wrote:
>> On 1/5/2022 3:13 PM, EricP wrote:
>>> Ivan Godard wrote:
>>>> On 1/4/2022 9:48 AM, EricP wrote:
>>>>> Ivan Godard wrote:
>>>>>> On 1/4/2022 12:39 AM, Thomas Koenig wrote:
>>>>>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>>>>>
>>>>>>>> Perhaps you haven't noticed me saying: *the belt is not physically a
>>>>>>>> shift register*.
>>>>>>>
>>>>>>> It's usually implemented as a circular buffer, correct?
>>>>>>
>>>>>> Not at all.
>>>>>>
>>>>>> Computed values are left where they were produced - FU's output
>>>>>> latches, for example - just as in a forwarding bypass network. Move
>>>>>> only happens if the location is needed by some other computation,
>>>>>> and then only to an adjacent location, also on the bypass network -
>>>>>> which the way issue happens guarantees is free.
>>>>>
>>>>> Moving results out of the way is what makes this work.
>>>>> If you only have one adder FU and you get a bunch of add instructions
>>>>> in a row, then you need to stash older results in other registers
>>>>> and this becomes the equivalent of a delayed register writeback.
>>>>> But I think you need a crossbar to accomplish this while a
>>>>> traditional approach just needs some small number of result buses.
>>>>>
>>>>>
>>>>
>>>> Still not :-)
>>>>
>>>> Our FU slots can (and typically do) support operations of different
>>>> natural latencies (pipe lengths). Each slot can accept one op per
>>>> cycle, so if the latencies differ you can get more than one result
>>>> retiring in the same cycle. Consequently the FUs have one result FF
>>>> per supported latency.
>>>>
>>>> If a op of latency N retires in cycle C to FF#N, necessarily the
>>>> following cycle C+1 the FF#N+1 is free (think about it). Consequently,
>>>> the FU's FFs are daisy chained so that each cycle FF#N is moved to
>>>> FF#N+1  and every result always is retiring to a known free FF; the
>>>> set of FFs are right next to each other so the move is trivial.
>>>
>>> This sounds like it has a belt's worth of FF for each FU.
>>> And some of them are shift registers?
>>> I'm a bit confused.
>>
>> There can be (and usually are) several FUs per slot, forming in effect a
>> "superFU". There is only one set of output FFs per slot, one per
>> latency. These are daisy chained. I suppose that you can think of the
>> output FF daisy as being a shift register, and it could be done that
>> way, but it also could be done by simply rotating which FF is considered
>> which latency, thereby replacing a physical data move with a
>> result-to-FF fanout. That's a HW design choice; IANAHWG.
>>
>>> I envisioned this having 1 belt of FF values and two crossbars,
>>> one crossbar to route any belt FF to any FU operands,
>>> one to route any FU result to the proper empty FF for its latency.
>>
>> Only one crossbar, which routes belt FF to slot operands. The slots know
>> which op has which latency and so directly connect the compute logic to
>> the right output FF. Say a slot has a 1-cycle adder and a 3-cycle
>> multiplier. The adder will be connected directly to the lat1 FF, the
>> mult to the lat3 FF, and nothing will go to the lat2 FF; this is not a
>> crossbar.
>>
>> Meanwhile, the lat1 FF daisies to the lat2 FF, the lat2 FF to the lat3,
>> and the lat3 to the spiller (although the config might put a lat4 or
>> more in there, depending on the tradeoff between more sources on the
>> input crossbar vs more value making it all the way op the daisy to press
>> on the spiller).
>>
>>> It is the logical to physical map that actually shifts.
>>> And a scoreboard tracks when FF operands are ready
>>> and are written into their correct physical slot.
>>>
>>>   Belt
>>>    FF ---->Xbar<--Map
>>>    ^       |  |
>>>    |       v  |
>>>   Xbar<---FU  v
>>>       <------FU
>>>
>> There's no scoreboard, because Mill is statically scheduled. The FU
>> result either goes direct to the right FF (which is actually built into
>> that FU) or fans out to it 1->N, so there's no second xbar. If you did
>> put in a scoreboard then you could use dynamic scheduling.
>>
>> However, dynamic requires another xbar into the scoreboard, and xbars
>> get out of hand quickly with increasing concurrency; the practical limit
>> seems to be around 8-way issue even with ridiculously long pipes. Mill
>> is intended for use up into concurrencies in the 20-30 range, so static
>> seems the only way to go, with other devices to avoid costs like cache
>> misses.
>
> Apple dominating Spec with the M1 while not having HyperThreading is
> something I had though impossible due to dram misses. So a smart enough CPU
> can pull of what you claim, which I was polite enough not to mock. Barely,
> dodged a bullet there. ;)
>
> I used to think Mill was doomed due to not going OoOE, curing mindset is
> hard.
> Apple has corrected my mindset and I now believe you can beat OoOE, at far
> lower cost and power. Pointing out the Apple M1 crushing x86 is something
> you could add to your investor relations. Break the x86 mindset.
>
> But you don’t have a billion dollars of brute force to add those smarts in
> every corner to prevent glass jaws. ;(
>
> Your biggest glass jaw may be aliasing in C code forcing crap code
> generation, you can add speculation and replay on fault to cure this. But
> that may cost more than you can afford on your first designs?
>
> This one change would convince every good programmer that Mill is viable,
> and could beat Apple’s 6 wide design. Otherwise you can’t do effective 6
> wide as aliasing caps your performance, you have a write every 6
> instructions at most.

Aliasing doesn't seem to be a problem. There is enough bandwidth in the
hierarchy that we can issue all the loads and stores as fast as their
arguments are available. Requests are issued in program order with no
overtaking, so there's no ambiguity. Deferred loads let is issue a load
before a store that is ahead of it in program order; the load retire
stations snoop the requests and reissue is a collision is detected
(which is rare). There's no LSQ; the L1 victim buffers serve for that.

We don't (yet) have address prediction, so stepping through a pointer
chain requires the loads to be done one by one. There's no good example
of that in our test suite; we may reopen the issue when we have a case
in hand.

Re: RISC-V vs. Aarch64

<sr8g7v$np0$1@dont-email.me>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22795&group=comp.arch#22795

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 6 Jan 2022 20:40:30 -0800
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <sr8g7v$np0$1@dont-email.me>
References: <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com>
<RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me>
<sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me>
<sqmmso$446$2@newsreader4.netcologne.de>
<gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com>
<sqpd0i$spj$1@newsreader4.netcologne.de>
<650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com>
<sqpocs$1so3$1@gioia.aioe.org> <sqpqbm$7qo$1@newsreader4.netcologne.de>
<sqq3ce$c4n$2@dont-email.me> <sqssff$a9j$1@gioia.aioe.org>
<077afaee-009e-4860-be45-61106126934bn@googlegroups.com>
<squhht$79u$1@dont-email.me>
<bb6d49bb-a676-44bd-9a6d-29386d429454n@googlegroups.com>
<sr0vhm$c4u$1@dont-email.me> <sr114i$1qc$1@newsreader4.netcologne.de>
<sr1dca$70e$1@dont-email.me> <kM%AJ.186634$np6.183460@fx46.iad>
<sr2gf6$64u$1@dont-email.me> <7DpBJ.254731$3q9.63673@fx47.iad>
<sr62tb$u2o$1@dont-email.me> <sr8bs4$2i3$1@dont-email.me>
<a88da679-061b-4507-879f-8860cacc972cn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jan 2022 04:40:32 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="305e35c08e81839d6e2c9ca9371b8a28";
logging-data="24352"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/lbpGAM0P2i2ls0LVMFHdx"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:t+uYa4VkxdTaqm644fC1OGSFnn0=
In-Reply-To: <a88da679-061b-4507-879f-8860cacc972cn@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Fri, 7 Jan 2022 04:40 UTC

On 1/6/2022 7:33 PM, MitchAlsup wrote:
> On Thursday, January 6, 2022 at 9:26:01 PM UTC-6, gg...@yahoo.com wrote:
>> Ivan Godard <iv...@millcomputing.com> wrote:
>
>> Apple dominating Spec with the M1 while not having HyperThreading is
>> something I had though impossible due to dram misses.
> <
> So, you bought into the <ahem> hype !
> <
>> So a smart enough CPU
>> can pull of what you claim, which I was polite enough not to mock. Barely,
>> dodged a bullet there. ;)
>>
>> I used to think Mill was doomed due to not going OoOE, curing mindset is
>> hard.
>> Apple has corrected my mindset and I now believe you can beat OoOE, at far
>> lower cost and power. Pointing out the Apple M1 crushing x86 is something
>> you could add to your investor relations. Break the x86 mindset.
>>
>> But you don’t have a billion dollars of brute force to add those smarts in
>> every corner to prevent glass jaws. ;(
>>
>> Your biggest glass jaw may be aliasing in C code forcing crap code
>> generation, you can add speculation and replay on fault to cure this. But
>> that may cost more than you can afford on your first designs?
> <
> The thing I don't understand about Mill (of whom I am a fan) is what kind
> of code gets produced when 7 out of 8 loops run alias free, but the 8th
> iteration does something different (alias, miss, TLB,...) than the other loops.

If any iteration load-misses for longer than the delay then we stall;
it's an in-order machine. Stores cannot miss, although if the store rate
exceeds the DRAM write rate for long enough you will stall after you've
filled all the caches. Caches are in virtual so aliasing doesn't matter.
TLB is after the caches and is all lazy and doesn't impact loops at all,
unless a load misses all the caches and also the TLB (very rare; entries
are very big). The PLB can miss, but that too is very rare because the
entries cover all of a mmap allocation no matter how big.

Re: RISC-V vs. Aarch64

<86pmp4vyqa.fsf@linuxsc.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22800&group=comp.arch#22800

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 06 Jan 2022 22:36:29 -0800
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <86pmp4vyqa.fsf@linuxsc.com>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me> <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me> <sqmsqq$14kp$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="e2005ac63493a238c185f329f8e563ad";
logging-data="22253"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX192R9c8FUBRf308AyP7i3b4v3Xju4YaRME="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:b2jyxHMu5ucao6R6q3yyGeVPhRk=
sha1:nIDOEwTnk0eSoO478+p1t6thfYs=
 by: Tim Rentsch - Fri, 7 Jan 2022 06:36 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:

> Marcus wrote:
>
>> On 2021-12-30, EricP wrote:
>>
>>> C,C++ and a bunch of languages explicitly define booleans as 0 or 1
>>> so this definition won't be optimal for those languages.
>>> VAX Fortran used 0,-1 for LOGICAL but I don't know if that
>>> was defined by the language or implementation dependant.
>
> -1 is better than 1, it can be used as a mask.
>
>> As a software developer I'm painfully aware of this. I decided not to
>> care too much about it, though. Really, most software that relies on
>> this property of C should be frowned upon. E.g. expressions like:
>>
>> a = b + (c == d);
>>
>> ...aren't really good programming practice.
>
> No! Please tell me it ain't so!
>
> I use that type of constructs [frequently ...]

Totally with you on this. People who don't see the benefit of
using 0/1 values instead of if/else in cases like this are stuck
in an antiquated way of thinking.

Re: RISC-V vs. Aarch64

<86lezsvy8v.fsf@linuxsc.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22801&group=comp.arch#22801

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 06 Jan 2022 22:46:56 -0800
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <86lezsvy8v.fsf@linuxsc.com>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me> <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me> <sqmsqq$14kp$1@gioia.aioe.org> <sqmthh$2ea$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="e2005ac63493a238c185f329f8e563ad";
logging-data="22253"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18UBivlQvFi6smRONxExNiVb/S9LGJOk2U="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:BmpW1eLKjX2rEDCcbPkH3wc1vNI=
sha1:L4s48ipeE4TIKUemaD0dBhWw70A=
 by: Tim Rentsch - Fri, 7 Jan 2022 06:46 UTC

Marcus <m.delete@this.bitsnbites.eu> writes:

> On 2021-12-31, Terje Mathisen wrote:
>
>> Marcus wrote:
>>
>>> On 2021-12-30, EricP wrote:
>>>
>>>> C,C++ and a bunch of languages explicitly define booleans as 0 or
>>>> 1 so this definition won't be optimal for those languages. VAX
>>>> Fortran used 0,-1 for LOGICAL but I don't know if that was
>>>> defined by the language or implementation dependant.
>>
>> -1 is better than 1, it can be used as a mask.
>>
>>> As a software developer I'm painfully aware of this. I decided
>>> not to care too much about it, though. Really, most software that
>>> relies on this property of C should be frowned upon. E.g.,
>>> expressions like:
>>>
>>> a = b + (c == d);
>>>
>>> ...aren't really good programming practice.
>>
>> No! Please tell me it ain't so!
>>
>> I use that type of constructs all over the place when writing
>> branchless code/doing table lookups etc.
>
> I think that you'll find that the following code produces the
> exact same result:
>
> int a;
> if (c == d) {
> a = b + 1;
> } else {
> a = b;
> }
>
> It too is completely branchless.

Being branchless is not the high order bit here.

> My main gripe with the former version is the implicit type
> conversion (boolean to integer), and that I don't like to see
> logical operands and arithmetic operands mixed in the same
> expression.

Apparently you are thinking of some other language, not C.
The result of comparison operators such as == have type int,
not some boolean type. And that is not just an accident.

Re: RISC-V vs. Aarch64

<86h7agvxun.fsf@linuxsc.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22802&group=comp.arch#22802

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 06 Jan 2022 22:55:28 -0800
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <86h7agvxun.fsf@linuxsc.com>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me> <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me> <sqmsqq$14kp$1@gioia.aioe.org> <VSFzJ.136700$7D4.47834@fx37.iad> <2021Dec31.203710@mips.complang.tuwien.ac.at> <KC_zJ.59028$Ak2.12921@fx20.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="e2005ac63493a238c185f329f8e563ad";
logging-data="22253"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18D4LCLNZ9xWBRVo9hv3Jv0RbZD4085AbM="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:KqNAK5hJ8xl7ejvcukSjvFEOhdA=
sha1:TuGZKqh/jMdakxsyI6bux2lkFTM=
 by: Tim Rentsch - Fri, 7 Jan 2022 06:55 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:

> The & and | operators normally act on integral data types [...]

Please forgive me for bringing up a technical fine point. The
operators & and | normally act on integer data types, not
integral data types. (In fact C doesn't have "integral" data
types.) The distinction is not just rhetorical: a constant such
as 3.0 has an integral value, but it does not have an integer
value.

Re: RISC-V vs. Aarch64

<864k6gvwnt.fsf@linuxsc.com>

  copy mid

https://news.novabbs.com/devel/article-flat.php?id=22805&group=comp.arch#22805

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 06 Jan 2022 23:21:10 -0800
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <864k6gvwnt.fsf@linuxsc.com>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sq5dj1$1q9$1@dont-email.me> <59376149-c3d3-489e-8b41-f21bdd0ce5a9n@googlegroups.com> <sqkcvk$n97$1@dont-email.me> <RrlzJ.130558$SR4.25229@fx43.iad> <sql2cm$3h7$1@dont-email.me> <sql73d$6es$2@newsreader4.netcologne.de> <sqmj5j$s31$1@dont-email.me> <sqmmso$446$2@newsreader4.netcologne.de> <gs2dnRZj-ucyZ1P8nZ2dnUU78YfNnZ2d@supernews.com> <sqpd0i$spj$1@newsreader4.netcologne.de> <650c822a-3776-4ea9-aa72-5a6b19bdcabbn@googlegroups.com> <sqpn9u$mi5$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="e2005ac63493a238c185f329f8e563ad";
logging-data="22253"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ex1zV5iBrtUqHV8sLkP/wicU3k2cVrqg="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:MH0Rz9mwEH6eTAoWbbPt9sQaylo=
sha1:QQBB4ww0r2UV7oMlpe8QfCWT1zg=
 by: Tim Rentsch - Fri, 7 Jan 2022 07:21 UTC

Ivan Godard <ivan@millcomputing.com> writes:

[ on splitting registers by type ]

> Splitting regs by type does offer some code (and encoding)
> advantages, but also some drawbacks. Think about how you would
> do VARARGS when bools are passed in the flags :-)

Not an issue, because variadic arguments of boolean type are
promoted to int before being passed. Indeed, one reason for
that rule is precisely to avoid the kinds of questions that
come up with narrow types such as booleans.


devel / comp.arch / Re: RISC-V vs. Aarch64

Pages:123456789101112131415
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor