递归下降语法分析设计原理与实现（词法分析+语法分析）

本文最后更新于 304 天前，其中的信息可能已经有所发展或是发生改变。

一、程序功能描述

[实验项目]

以专题1词法分析程序的输出为语法分析的输入，完成以下描述赋值语句

的 LL(1)文法的递归下降分析程序

G[S]: S→V=E

E→TE′

E′→ATE′|ε

T→FT′

T′→MFT′|ε

F→ (E)|i

A→+|-

M→*|/

V→i

[设计说明]

终结符号 i 为用户定义的简单变量，即标识符的定义。

[设计要求]

（1）递归下降语法分析的输入为词法分析的输出二元式序列，即某算术表达式“专题 1”的输出结果，输出为输入串是否为该文法定义的算术表达式的判断结果；

（2）递归下降分析程序应能发现简单的语法错误；

（3）设计至少四个测试用例（尽可能完备，正确和出错），并给出测试结果；

（4）选做：如有可能，考虑如何用文法描述C语言的if 语句，使整个文法仍然为 LL（1）文法，并使得你的递归下降程序可以分析赋值语句和 if 语句。

二、主要数据结构

变量及类型

用途

TokenType枚举

类型: enum class

定义词法单元的类型，如标识符、整数、运算符等。

Token 结构体

类型: struct

TokenType type: 词法单元的类型。

std::string value: 词法单元的值。

表示一个词法单元。

keywords 常量

类型:

std::unordered_map<std::string, TokenType>

存储关键字及其对应的词法单元类型。

Lexer 类：

成员:

std::string input: 输入字符串。

size_t position: 当前位置。

char current_char: 当前字符。

Lexer(const std::string& input): 构造函数，初始化输入和当前位置。

void advance(): 移动到下一个字符。

void skip_whitespace(): 跳过空白字符。

Token identifier_or_keyword(): 处理标识符或关键字。

Token number(): 处理数字。

Token single_char_token(char ch, TokenType type): 处理单字符词法单元。

Token double_char_token(std::string chars, TokenType type): 处理双字符词法单元。

Token get_next_token(): 获取下一个词法单元。

char peek(): 查看下一个字符。

Parser 类:

成员:

std::vector<Token> tokens: 存储词法单元的向量。

size_t pos: 当前解析位置。

Parser(std::vector<Token>& tokens): 构造函数，初始化词法单元向量和当前位置。

Token consume(TokenType expected): 消费预期的词法单元。

bool match(TokenType type): 匹配预期的词法单元。

void expect(TokenType expected): 预期某个词法单元，否则抛出异常。

void S(), void E(), void E_prime(), void T(), void T_prime(), void F(), void A(), void M(), void V(): 语法分析的方法，分别对应不同的语法规则。

void parse(): 开始解析。

main 函数

主函数，读取输入文件，创建词法分析器和语法分析器对象，进行解析并输出结果。

三、程序结构描述

设计方法

递归下降分析是最常用的自顶向下分析方法。它通过一组递归函数来实现对文法规则的直接翻译。

函数定义及函数间的调用关系

函数名称	函数功能描述
Parser(std::vector<Token>& tokens)	构造函数，初始化解析器并设置记号流。
void parse()	主解析函数，启动解析过程。
Token consume(TokenType expected)	消费预期的记号，如果当前记号符合预期，则返回该记号并移动指针；否则抛出异常。
bool match(TokenType type)	匹配预期的记号，如果当前记号符合预期，则移动指针并返回 true；否则返回 false。
void expect(TokenType expected)	期望某个记号，如果当前记号不符合预期，则抛出异常。
void array_declaration()	解析数组声明语句，如 int arr[10];
void array_initialization()	解析数组初始化语句，如 int arr[10] = {1, 2, 3};。
void expression_list()	解析表达式列表，如 {1, 2, 3}。
void expression()	解析单个表达式，如 1 + 2 或 arr[0]。
void array_access()	解析数组访问表达式，如 arr[0]。
void parse_error(const std::string& msg)	抛出解析错误，并附带错误信息。
void unexpected_token_error(TokenType expected, TokenType actual)	处理预期记号与实际记号不匹配的情况，抛出错误信息。
void debug_print(const std::string& msg)	在调试模式下输出调试信息。

四、程序测试

测试用例1：

x = 5;

y = x + 3;

z = y - 2;

w = z * 4 / 2;

a = (b + c) * d;

程序执行结果：

测试用例2：

int main() {

int a = 10;

a = a + 1;

return 0

}

程序执行结果：

测试用例3：

y = (x + 3;

程序执行结果：

测试用例4：

z = y ++ 2;

程序执行结果：

五、源代码附录

#include 
#include 
#include 
#include 
#include 
#include 
#include 

enum class TokenType {
    IDENTIFIER, INTEGER, PLUS, MINUS, TIMES, DIVIDE,
    ASSIGN, SEMICOLON, LPAREN, RPAREN, EOF_TOKEN,
    INCREMENT, DECREMENT, LSHIFT, RSHIFT, PLUS_ASSIGN,
    MINUS_ASSIGN, TIMES_ASSIGN, DIVIDE_ASSIGN, AND, OR, NOT,
    LBRACE, RBRACE,
    // 可以继续添加其他类型
};

struct Token {
    TokenType type;
    std::string value;

    Token(TokenType t, std::string v) : type(t), value(v) {}
};

std::ostream& operator<<(std::ostream& os, const Token& token) {
    return os << "(" << static_cast(token.type) << ", " << token.value << ")";
}

const std::unordered_map<std::string, TokenType> keywords = {
    {"void", TokenType::IDENTIFIER}, // 这里假设关键字也作为标识符处理，可以根据需求调整
    {"int", TokenType::IDENTIFIER},
    {"float", TokenType::IDENTIFIER},
    {"double", TokenType::IDENTIFIER},
    {"if", TokenType::IDENTIFIER},
    {"else", TokenType::IDENTIFIER},
    {"for", TokenType::IDENTIFIER},
    {"do", TokenType::IDENTIFIER},
    {"while", TokenType::IDENTIFIER}
};

class Lexer {
public:
    Lexer(const std::string& input) : input(input), position(0), current_char(input[position]) {}

    Token get_next_token();

private:
    void advance();
    void skip_whitespace();
    Token identifier_or_keyword();
    Token number();
    Token single_char_token(char ch, TokenType type);
    Token double_char_token(std::string chars, TokenType type);
    char peek();

    std::string input;
    size_t position;
    char current_char;
};

void Lexer::advance() {
    position++;
    if (position < input.length()) { current_char = input[position]; } else { current_char = '\0'; // 表示已到达输入末尾 } } void Lexer::skip_whitespace() { while (current_char != '\0' && isspace(current_char)) { advance(); } } Token Lexer::identifier_or_keyword() { std::string value; while (isalnum(current_char) || current_char == '_') { value += current_char; advance(); } auto it = keywords.find(value); if (it != keywords.end()) { return Token(it->second, value);
    }
    else {
        return Token(TokenType::IDENTIFIER, value);
    }
}

Token Lexer::number() {
    std::string value;
    while (isdigit(current_char)) {
        value += current_char;
        advance();
    }

    if (current_char == '.') {
        value += current_char;
        advance();
        while (isdigit(current_char)) {
            value += current_char;
            advance();
        }
        return Token(TokenType::INTEGER, value); // 假设这里处理的是浮点数
    }
    else {
        return Token(TokenType::INTEGER, value);
    }
}

Token Lexer::single_char_token(char ch, TokenType type) {
    advance();
    return Token(type, std::string(1, ch));
}

Token Lexer::double_char_token(std::string chars, TokenType type) {
    advance();
    advance();
    return Token(type, chars);
}

Token Lexer::get_next_token() {
    while (current_char != '\0') {
        if (isspace(current_char)) {
            skip_whitespace();
            continue;
        }

        if (isalpha(current_char) || current_char == '_') {
            return identifier_or_keyword();
        }

        if (isdigit(current_char)) {
            return number();
        }

        if (current_char == '+') {
            if (peek() == '+') {
                return double_char_token("++", TokenType::INCREMENT);
            }
            else if (peek() == '=') {
                return double_char_token("+=", TokenType::PLUS_ASSIGN);
            }
            else {
                return single_char_token('+', TokenType::PLUS);
            }
        }

        if (current_char == '-') {
            if (peek() == '-') {
                return double_char_token("--", TokenType::DECREMENT);
            }
            else if (peek() == '=') {
                return double_char_token("-=", TokenType::MINUS_ASSIGN);
            }
            else {
                return single_char_token('-', TokenType::MINUS);
            }
        }

        if (current_char == '*') {
            if (peek() == '=') {
                return double_char_token("*=", TokenType::TIMES_ASSIGN);
            }
            else {
                return single_char_token('*', TokenType::TIMES);
            }
        }

        if (current_char == '/') {
            if (peek() == '=') {
                return double_char_token("/=", TokenType::DIVIDE_ASSIGN);
            }
            else {
                return single_char_token('/', TokenType::DIVIDE);
            }
        }

        if (current_char == '&') {
            if (peek() == '&') {
                return double_char_token("&&", TokenType::AND);
            }
        }

        if (current_char == '|') {
            if (peek() == '|') {
                return double_char_token("||", TokenType::OR);
            }
        }

        if (current_char == '!') {
            if (peek() == '=') {
                return double_char_token("!=", TokenType::NOT);
            }
            else {
                return single_char_token('!', TokenType::NOT);
            }
        }

        if (current_char == '=') {
            if (peek() == '=') {
                return double_char_token("==", TokenType::ASSIGN);
            }
            else {
                return single_char_token('=', TokenType::ASSIGN);
            }
        }

        if (current_char == ';') {
            return single_char_token(';', TokenType::SEMICOLON);
        }

        if (current_char == '(') {
            return single_char_token('(', TokenType::LPAREN);
        }

        if (current_char == ')') {
            return single_char_token(')', TokenType::RPAREN);
        }

        if (current_char == '{') {
            return single_char_token('{', TokenType::LBRACE);
        }

        if (current_char == '}') {
            return single_char_token('}', TokenType::RBRACE);
        }

        std::cout << "未识别的字符: '" << current_char << "'" << std::endl;
        throw std::runtime_error("非法字符");
    }

    return Token(TokenType::EOF_TOKEN, "");
}

char Lexer::peek() {
    if (position + 1 < input.length()) {
        return input[position + 1];
    }
    else {
        return '\0';
    }
}

class Parser {
public:
    Parser(std::vector& tokens) : tokens(tokens), pos(0) {}

    void parse();

private:
    std::vector tokens;
    size_t pos;

    Token consume(TokenType expected);
    bool match(TokenType type);
    void expect(TokenType expected);

    void S();
    void E();
    void E_prime();
    void T();
    void T_prime();
    void F();
    void A();
    void M();
    void V();
};

Token Parser::consume(TokenType expected) {
    if (pos < tokens.size() && tokens[pos].type == expected) {
        return tokens[pos++];
    }
    else {
        throw std::runtime_error("Unexpected token at position " + std::to_string(pos));
    }
}

bool Parser::match(TokenType type) {
    if (pos < tokens.size() && tokens[pos].type == type) {
        pos++;
        return true;
    }
    return false;
}

void Parser::expect(TokenType expected) {
    if (!match(expected)) {
        throw std::runtime_error("Expected token of type " + std::to_string(static_cast(expected)) + " but got " + std::to_string(static_cast(tokens[pos].type)));
    }
}

void Parser::S() {
    V();
    expect(TokenType::ASSIGN);
    E();
    expect(TokenType::SEMICOLON);
}

void Parser::E() {
    T();
    E_prime();
}

void Parser::E_prime() {
    std::cout << "Entering E_prime() at position " << pos << std::endl;
    if (tokens[pos].type == TokenType::PLUS || tokens[pos].type == TokenType::MINUS) {
        std::cout << "Matched ADDITIVE OPERATOR, calling T()" << std::endl;
        A();
        T();
        E_prime(); // 递归调用 E_prime() 处理后续表达式
    }
    std::cout << "Exiting E_prime() at position " << pos << std::endl;
}

void Parser::T() {
    F();
    T_prime();
}

void Parser::T_prime() {
    std::cout << "Entering T_prime() at position " << pos << std::endl;
    if (tokens[pos].type == TokenType::TIMES || tokens[pos].type == TokenType::DIVIDE) {
        std::cout << "Matched MULTIPLICATIVE OPERATOR, calling F()" << std::endl;
        M();
        F();
        T_prime(); // 递归调用 T_prime() 处理后续表达式
    }
    std::cout << "Exiting T_prime() at position " << pos << std::endl;
}

void Parser::F() {
    if (tokens[pos].type == TokenType::LPAREN) {
        std::cout << "Matched LPAREN, calling E()" << std::endl;
        expect(TokenType::LPAREN);
        E();
        expect(TokenType::RPAREN);
    }
    else if (tokens[pos].type == TokenType::IDENTIFIER) {
        std::cout << "Matched IDENTIFIER: " << tokens[pos].value << std::endl;
        pos++;
    }
    else if (tokens[pos].type == TokenType::INTEGER) {
        std::cout << "Matched INTEGER: " << tokens[pos].value << std::endl;
        pos++;
    }
    else {
        std::cerr << "Current token: " << tokens[pos] << std::endl;
        throw std::runtime_error("Unexpected token in F()");
    }
}

void Parser::A() {
    if (tokens[pos].type == TokenType::PLUS || tokens[pos].type == TokenType::MINUS) {
        std::cout << "Matched ADDITIVE OPERATOR: " << tokens[pos].value << std::endl;
        pos++; // 移动到下一个记号
    }
    else {
        throw std::runtime_error("Expected '+' or '-'");
    }
}

void Parser::M() {
    if (tokens[pos].type == TokenType::TIMES || tokens[pos].type == TokenType::DIVIDE) {
        std::cout << "Matched MULTIPLICATIVE OPERATOR: " << tokens[pos].value << std::endl;
        pos++; // 移动到下一个记号
    }
    else {
        throw std::runtime_error("Expected '*' or '/'");
    }
}

void Parser::V() {
    if (tokens[pos].type != TokenType::IDENTIFIER) {
        throw std::runtime_error("Expected identifier");
    }
    pos++;
}

void Parser::parse() {
    try {
        while (pos < tokens.size() && tokens[pos].type != TokenType::EOF_TOKEN) {
            S();
        }
        if (pos < tokens.size()) {
            throw std::runtime_error("Unexpected token after parsing complete expression");
        }
    }
    catch (const std::runtime_error& e) {
        std::cerr << "Parse error: " << e.what() << std::endl;
    }
}

int main() {
    // 从 example.txt 文件中读取内容
    std::ifstream input_file("example.txt");
    if (!input_file.is_open()) {
        std::cerr << "无法打开 example.txt 文件" << std::endl;
        return 1;
    }

    std::string input((std::istreambuf_iterator(input_file)), std::istreambuf_iterator());
    input_file.close();

    // 创建 Lexer 对象
    Lexer lexer(input);
    std::vector tokens;

    // 获取所有记号
    Token token = lexer.get_next_token();
    while (token.type != TokenType::EOF_TOKEN) {
        tokens.push_back(token);
        token = lexer.get_next_token();
    }

    // 创建 Parser 对象并解析
    Parser parser(tokens);
    parser.parse();

    std::cout << "语法分析完成" << std::endl;

    return 0;
}

一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

一、程序功能描述

[实验项目]

[设计说明]

[设计要求]

二、主要数据结构

三、程序结构描述

设计方法

函数定义及函数间的调用关系

四、程序测试

测试用例1：

测试用例2：

测试用例3：

测试用例4：

五、源代码附录

发送评论 编辑评论

推荐文章

发送评论编辑评论