From 3f6a14a59a9076b6fe96832f7c4e1a0856add4c3 Mon Sep 17 00:00:00 2001 From: Yehowshua Immanuel Date: Tue, 10 Dec 2024 11:44:28 -0500 Subject: [PATCH] add some documentation about Parser and it's behaviors --- TODO.md | 2 + src/RTLILParser/README.md | 42 +++++ src/RTLILParser/rtlil_text.rst | 297 +++++++++++++++++++++++++++++++++ 3 files changed, 341 insertions(+) create mode 100644 src/RTLILParser/README.md create mode 100644 src/RTLILParser/rtlil_text.rst diff --git a/TODO.md b/TODO.md index ddbda4d..4bb47ca 100644 --- a/TODO.md +++ b/TODO.md @@ -45,6 +45,8 @@ - [ ] Embed locs in AST - [ ] Scrap `pEolAndAdvanceToNextNonWs` and use `tok` - [ ] Remove `preProcessDiscardComments` from exports + - [ ] Install README in dir containing `Parser.hs` + - [ ] Discuss deviations of parser against Yosys behaviors - [x] Are the `try` statements in `pWireOption` correctly constructed? - [ ] Consider the very weird case where the process body has nothing, thus, `pEolAndAdvanceToNextNonWs` may never get invoked in any of diff --git a/src/RTLILParser/README.md b/src/RTLILParser/README.md new file mode 100644 index 0000000..cd481fe --- /dev/null +++ b/src/RTLILParser/README.md @@ -0,0 +1,42 @@ +# About + +This directory contains the sources for the Register Transfer Logic +Intermediate Language(RTLIL) used by Yosys. RTLIL started off as an +internal language within the Yosys synthesis engine, but later, +an official Yosys RTLIL language frontend(ingester) and backend(emitter) +emerged along with an accompanying RTLIL EBNF grammar. Included in this +directory is the RTLIL EBNF grammar that was referenced when constructing +the Haskellator RTLIL Parsec parser contained in the directory. + +Of note is that there may be some deviations in the behavior of the +Haskellator Parser implementation from the actual Lex/Yacc implementation +used in the Yosys frontend. These deviations arise because the Lex/Yacc implementation in the Yosys frontend deviates from the EBNF RTLIL +grammar. I make an attempt to capture these deviations in the +"Discrepancies between Lex/Yacc Yosys RTLIL Frontend and Yosys +Documentation EBNF Grammar" section in this README. + +Lastly, a copy of the grammar that was referenced when building the +Haskellator RTLIL parser is included in this directory as +"rtlil_text.rst". You can also find this document in the upstream +Yosys sources pinned to commit `8148ebd` [here][ebnf-yosys-upstream]. + +# Discrepancies between Lex/Yacc Yosys RTLIL Frontend and Yosys Documentation EBNF Grammar +1. As of Yosys commit `8148ebd`, the Lex/Yacc RTLIL frontend allows + attribute statements, switch statements, and assignment statements + to appear in any order at the root level of a process body. + The relevant snippet of Yacc code can be found + [here][yacc-code-snippet]. + + By contrast, the EBNF grammar doc as of commit `8148ebd` allows + multiple switch statements at the root level of a process body, + but requires that all assignment statements occur before the + first switch statement. The EBNF grammar also effectively + requires that attribute statements be placed above their respectve + switch statement. In practice, this second deviation is not an + issue as I've never seen a tool that emits RTLIL violate it. + The revelant snippet of the EBNF grammar can be found + [here][ebnf-grammar-snippet]. + +[ebnf-yosys-upstream]: https://github.com/YosysHQ/yosys/blob/87736a2bf9710e307fbf9e57e6cece7586314cf7/docs/source/appendix/rtlil_text.rst +[yacc-code-snippet]: https://github.com/YosysHQ/yosys/blob/87736a2bf9710e307fbf9e57e6cece7586314cf7/frontends/rtlil/rtlil_parser.y#L337-L341 +[ebnf-grammar-snippet]: https://github.com/YosysHQ/yosys/blob/87736a2bf9710e307fbf9e57e6cece7586314cf7/docs/source/appendix/rtlil_text.rst?plain=1#L253 \ No newline at end of file diff --git a/src/RTLILParser/rtlil_text.rst b/src/RTLILParser/rtlil_text.rst new file mode 100644 index 0000000..b1bc9c5 --- /dev/null +++ b/src/RTLILParser/rtlil_text.rst @@ -0,0 +1,297 @@ +.. _chapter:textrtlil: + +RTLIL text representation +------------------------- + +This appendix documents the text representation of RTLIL in extended Backus-Naur +form (EBNF). + +The grammar is not meant to represent semantic limitations. That is, the grammar +is "permissive", and later stages of processing perform more rigorous checks. + +The grammar is also not meant to represent the exact grammar used in the RTLIL +frontend, since that grammar is specific to processing by lex and yacc, is even +more permissive, and is somewhat less understandable than simple EBNF notation. + +Finally, note that all statements (rules ending in ``-stmt``) terminate in an +end-of-line. Because of this, a statement cannot be broken into multiple lines. + +Lexical elements +~~~~~~~~~~~~~~~~ + +Characters +^^^^^^^^^^ + +An RTLIL file is a stream of bytes. Strictly speaking, a "character" in an RTLIL +file is a single byte. The lexer treats multi-byte encoded characters as +consecutive single-byte characters. While other encodings *may* work, UTF-8 is +known to be safe to use. Byte order marks at the beginning of the file will +cause an error. + +ASCII spaces (32) and tabs (9) separate lexer tokens. + +A ``nonws`` character, used in identifiers, is any character whose encoding +consists solely of bytes above ASCII space (32). + +An ``eol`` is one or more consecutive ASCII newlines (10) and carriage returns +(13). + +Identifiers +^^^^^^^^^^^ + +There are two types of identifiers in RTLIL: + +- Publically visible identifiers +- Auto-generated identifiers + +.. code:: BNF + + ::= | + ::= \ + + ::= $ + + +Values +^^^^^^ + +A *value* consists of a width in bits and a bit representation, most +significant bit first. Bits may be any of: + +- ``0``: A logic zero value +- ``1``: A logic one value +- ``x``: An unknown logic value (or don't care in case patterns) +- ``z``: A high-impedance value (or don't care in case patterns) +- ``m``: A marked bit (internal use only) +- ``-``: A don't care value + +An *integer* is simply a signed integer value in decimal format. **Warning:** +Integer constants are limited to 32 bits. That is, they may only be in the range +:math:`[-2147483648, 2147483648)`. Integers outside this range will result in an +error. + +.. code:: BNF + + ::= + ' * + ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 + ::= 0 | 1 | x | z | m | - + ::= -? + + +Strings +^^^^^^^ + +A string is a series of characters delimited by double-quote characters. Within +a string, any character except ASCII NUL (0) may be used. In addition, certain +escapes can be used: + +- ``\n``: A newline +- ``\t``: A tab +- ``\ooo``: A character specified as a one, two, or three digit octal value + +All other characters may be escaped by a backslash, and become the following +character. Thus: + +- ``\\``: A backslash +- ``\"``: A double-quote +- ``\r``: An 'r' character + +Comments +^^^^^^^^ + +A comment starts with a ``#`` character and proceeds to the end of the line. All +comments are ignored. + +File +~~~~ + +A file consists of an optional autoindex statement followed by zero or more +modules. + +.. code:: BNF + + ::= ? * + +Autoindex statements +^^^^^^^^^^^^^^^^^^^^ + +The autoindex statement sets the global autoindex value used by Yosys when it +needs to generate a unique name, e.g. ``flattenN``. The N part is filled with +the value of the global autoindex value, which is subsequently incremented. This +global has to be dumped into RTLIL, otherwise e.g. dumping and running a pass +would have different properties than just running a pass on a warm design. + +.. code:: BNF + + ::= autoidx + +Modules +^^^^^^^ + +Declares a module, with zero or more attributes, consisting of zero or more +wires, memories, cells, processes, and connections. + +.. code:: BNF + + ::= * + ::= module + ::= ( + | + | + | + | )* + ::= parameter ? + ::= | | + ::= end + +Attribute statements +^^^^^^^^^^^^^^^^^^^^ + +Declares an attribute with the given identifier and value. + +.. code:: BNF + + ::= attribute + +Signal specifications +^^^^^^^^^^^^^^^^^^^^^ + +A signal is anything that can be applied to a cell port, i.e. a constant value, +all bits or a selection of bits from a wire, or concatenations of those. + +**Warning:** When an integer constant is a sigspec, it is always 32 bits wide, +2's complement. For example, a constant of :math:`-1` is the same as +``32'11111111111111111111111111111111``, while a constant of :math:`1` is the +same as ``32'1``. + +See :ref:`sec:rtlil_sigspec` for an overview of signal specifications. + +.. code:: BNF + + ::= + | + | [ (:)? ] + | { * } + +Connections +^^^^^^^^^^^ + +Declares a connection between the given signals. + +.. code:: BNF + + ::= connect + +Wires +^^^^^ + +Declares a wire, with zero or more attributes, with the given identifier and +options in the enclosing module. + +See :ref:`sec:rtlil_cell_wire` for an overview of wires. + +.. code:: BNF + + ::= * + ::= wire * + ::= + ::= width + | offset + | input + | output + | inout + | upto + | signed + +Memories +^^^^^^^^ + +Declares a memory, with zero or more attributes, with the given identifier and +options in the enclosing module. + +See :ref:`sec:rtlil_memory` for an overview of memory cells, and +:ref:`sec:memcells` for details about memory cell types. + +.. code:: BNF + + ::= * + ::= memory * + ::= width + | size + | offset + +Cells +^^^^^ + +Declares a cell, with zero or more attributes, with the given identifier and +type in the enclosing module. + +Cells perform functions on input signals. See :doc:`/cell_index` for a detailed +list of cell types. + +.. code:: BNF + + ::= * * + ::= cell + ::= + ::= + ::= parameter (signed | real)? + | connect + ::= end + + +Processes +^^^^^^^^^ + +Declares a process, with zero or more attributes, with the given identifier in +the enclosing module. The body of a process consists of zero or more +assignments followed by zero or more switches and zero or more syncs. + +See :ref:`sec:rtlil_process` for an overview of processes. + +.. code:: BNF + + ::= * + ::= process + ::= * * * + ::= assign + ::= + ::= + ::= end + +Switches +^^^^^^^^ + +Switches test a signal for equality against a list of cases. Each case specifies +a comma-separated list of signals to check against. If there are no signals in +the list, then the case is the default case. The body of a case consists of zero +or more assignments followed by zero or more switches. Both switches and cases +may have zero or more attributes. + +.. code:: BNF + + ::= * + := * switch + ::= * + ::= case ? + ::= (, )* + ::= * * + ::= end + +Syncs +^^^^^ + +Syncs update signals with other signals when an event happens. Such an event may +be: + +- An edge or level on a signal +- Global clock ticks +- Initialization +- Always + +.. code:: BNF + + ::= * + ::= sync + | sync global + | sync init + | sync always + ::= low | high | posedge | negedge | edge + ::= update