Tuesday, September 23, 2025

Building a Simple C++ Method Extractor in C#

 

Extracting C++ Method Bodies with C#

When working with large C++ projects, sometimes all you need is a quick way to extract specific method bodies for debugging, refactoring, or documentation. Instead of building a full-fledged parser, you can rely on C# with regex and brace matching to create a lightweight and practical utility.

In this blog, we’ll walk through a step-by-step implementation of a tool that extracts method bodies from .cpp files. The process involves a simple main program, a model class, and the core extractor that handles regex parsing and safe brace matching.

Step 1 — Main Program (Program.cs)

The entry point initializes the extractor with a .cpp file path and prints out the results:

class Program
{
    static void Main(String[] args)
    {
        var methodBody = CppRegexExtractor.ExtractMethodsBodyWithRegex(@"C:\Temp\Dummy!.cpp");
        Console.WriteLine(methodBody);
        Console.ReadLine();
    }
}

This ensures the tool can read a given C++ source file and display the extracted method bodies.

Step 2 — Model Class (ProgramBody.cs)

We define a simple model to store each method’s name (Key) and body content (Content):

using System;

namespace ExtractCppCode
{
    public class ProgramBody
    {
        public string Key { get; set; }
        public string Content { get; set; }
    }
}

This acts as a container for extracted results, making it easier to handle multiple methods.

Step 3 — Core Extractor (CppRegexExtractor.cs)

Here’s where the real logic lives. The extractor uses regex to detect method signatures, then employs a brace matching algorithm to safely capture the full method body (ignoring braces inside strings or comments).

Key steps include:

  1. Regex pattern to detect C++ method signatures.
  2. Brace matching to find the correct closing brace.
  3. Optional sub-method detection to include methods invoked inside a target method (like SaveProduct).
using ExtractCppCode;
using System.Text;
using System.Text.RegularExpressions;

public static class CppRegexExtractor
{
    public static string ExtractMethodsBodyWithRegex(string cppFilePath)
    {
        if (!File.Exists(cppFilePath))
        {
            Console.WriteLine($"Error: File not found at {cppFilePath}");
            return "";
        }

        Console.WriteLine("Start Extracting Methods..........");

        List<ProgramBody> lines = new List<ProgramBody>();
        List<ProgramBody> subLines = new List<ProgramBody>();
        StringBuilder builder = new StringBuilder();

        string cppContent = File.ReadAllText(cppFilePath);

        // Regex for method signatures
        string pattern = @"(?<retType>[\w\s\*&<>:]+)\s+(?<className>[\w:]+::)?(?<methodName>\w+)\s*\((?<params>.*?)\)\s*\{";
        Regex regex = new Regex(pattern, RegexOptions.Multiline | RegexOptions.ExplicitCapture);
        MatchCollection matches = regex.Matches(cppContent);

        foreach (Match match in matches)
        {
            string methodName = match.Groups["methodName"].Value;
            if (methodName == "if") continue; // skip keywords

            int bodyStartIdx = match.Index + match.Length - 1;
            int bodyEndIdx = FindMatchingBrace(cppContent, bodyStartIdx);

            string result = cppContent.Substring(match.Index, bodyEndIdx - match.Index + 1);
            lines.Add(new ProgramBody { Key = methodName, Content = result });
        }

        // Extract SaveProduct + its submethods
        var selectedMethod = lines.Find(x => x.Key == "PreSaveUpdate");
        if (selectedMethod != null)
        {
            builder.Append(selectedMethod.Content);

            string subPattern = @"\b(?<methodName>\w+)\s*\((?<arguments>[^)]*)\)";
            MatchCollection subMatches = Regex.Matches(selectedMethod.Content, subPattern);

            foreach (Match subMatch in subMatches)
            {
                string subMethodName = subMatch.Groups["methodName"].Value.Trim();
                if (subMethodName == selectedMethod.Key || subMethodName == "if") continue;

                var subItem = lines.Find(y => y.Key == subMethodName);
                if (subItem != null)
                {
                    subLines.Add(new ProgramBody
                    {
                        Key = subMethodName,
                        Content = subItem.Content
                    });
                }
            }

            foreach (var line in subLines)
            {
                builder.Append(line.Content);
            }
        }

        Console.WriteLine("Extracting Methods Completed..........");
        return builder.ToString();
    }

    // Finds the matching closing brace
    private static int FindMatchingBrace(string source, int openingBraceIdx)
    {
        if (source[openingBraceIdx] != '{')
            throw new ArgumentException("openingBraceIdx must point at a '{' character.");

        int depth = 0;
        bool inSingleLineComment = false;
        bool inMultiLineComment = false;
        bool inString = false;
        bool inChar = false;
        bool escaped = false;

        for (int i = openingBraceIdx; i < source.Length; i++)
        {
            char c = source[i];

            if (escaped) { escaped = false; continue; }
            if (c == '\\' && (inString || inChar)) { escaped = true; continue; }

            if (inSingleLineComment) { if (c == '\n') inSingleLineComment = false; continue; }
            if (inMultiLineComment) { if (c == '*' && i + 1 < source.Length && source[i + 1] == '/') { inMultiLineComment = false; i++; } continue; }

            if (inString) { if (c == '"') inString = false; continue; }
            if (inChar) { if (c == '\'') inChar = false; continue; }

            if (c == '/' && i + 1 < source.Length)
            {
                char next = source[i + 1];
                if (next == '/') { inSingleLineComment = true; i++; continue; }
                if (next == '*') { inMultiLineComment = true; i++; continue; }
            }
            if (c == '"') { inString = true; continue; }
            if (c == '\'') { inChar = true; continue; }

            if (c == '{') depth++;
            else if (c == '}')
            {
                depth--;
                if (depth == 0) return i;
            }
        }

        throw new InvalidOperationException("Unbalanced braces detected.");
    }
}

Sample Input — Dummy.cpp

Here’s a test C++ file containing various methods, including comments and tricky braces inside strings:

#include <iostream>
#include <string>

// A helper method with braces inside strings and comments
int helper1(int x)
{
    // This brace } in a comment should be ignored
    std::string s = "example with brace } inside string";
    if (x > 0) {
        return x + 1;
    }
    return 0;
}

/* Multiline comment with { and } that should be ignored */

void helper2()
{
    for (int i = 0; i < 3; ++i)
    {
        std::cout << "loop " << i << std::endl;
    }
}

// The main method we want to extract
bool SaveProduct(int id)
{
    std::cout << "SaveProduct called" << std::endl;
    int r = helper1(id);
    if (r > 0)
    {
        helper2();
        return true;
    }
    return false;
}

// Another unrelated method
double unrelated(double a, double b)
{
    return a * b;
}

Expected Extracted Output

When the extractor runs, it captures SaveProduct along with its submethods helper1 and helper2:

bool SaveProduct(int id)
{
    std::cout << "SaveProduct called" << std::endl;
    int r = helper1(id);
    if (r > 0)
    {
        helper2();
        return true;
    }
    return false;
}

int helper1(int x)
{
    // This brace } in a comment should be ignored
    std::string s = "example with brace } inside string";
    if (x > 0) {
        return x + 1;
    }
    return 0;
}

void helper2()
{
    for (int i = 0; i < 3; ++i)
    {
        std::cout << "loop " << i << std::endl;
    }
}

No comments:

Post a Comment