Extracting C++ Method Bodies with C#
When working with large C++ projects, sometimes all you need is a quick way to extract specific method bodies for debugging, refactoring, or documentation. Instead of building a full-fledged parser, you can rely on C# with regex and brace matching to create a lightweight and practical utility.
In this blog, we’ll walk through a step-by-step implementation of a tool that extracts method bodies from .cpp
files. The process involves a simple main program, a model class, and the core extractor that handles regex parsing and safe brace matching.
Step 1 — Main Program (Program.cs
)
The entry point initializes the extractor with a .cpp
file path and prints out the results:
class Program
{
static void Main(String[] args)
{
var methodBody = CppRegexExtractor.ExtractMethodsBodyWithRegex(@"C:\Temp\Dummy!.cpp");
Console.WriteLine(methodBody);
Console.ReadLine();
}
}
This ensures the tool can read a given C++ source file and display the extracted method bodies.
Step 2 — Model Class (ProgramBody.cs
)
We define a simple model to store each method’s name (Key
) and body content (Content
):
using System;
namespace ExtractCppCode
{
public class ProgramBody
{
public string Key { get; set; }
public string Content { get; set; }
}
}
This acts as a container for extracted results, making it easier to handle multiple methods.
Step 3 — Core Extractor (CppRegexExtractor.cs
)
Here’s where the real logic lives. The extractor uses regex to detect method signatures, then employs a brace matching algorithm to safely capture the full method body (ignoring braces inside strings or comments).
Key steps include:
- Regex pattern to detect C++ method signatures.
- Brace matching to find the correct closing brace.
- Optional sub-method detection to include methods invoked inside a target method (like
SaveProduct
).
using ExtractCppCode;
using System.Text;
using System.Text.RegularExpressions;
public static class CppRegexExtractor
{
public static string ExtractMethodsBodyWithRegex(string cppFilePath)
{
if (!File.Exists(cppFilePath))
{
Console.WriteLine($"Error: File not found at {cppFilePath}");
return "";
}
Console.WriteLine("Start Extracting Methods..........");
List<ProgramBody> lines = new List<ProgramBody>();
List<ProgramBody> subLines = new List<ProgramBody>();
StringBuilder builder = new StringBuilder();
string cppContent = File.ReadAllText(cppFilePath);
// Regex for method signatures
string pattern = @"(?<retType>[\w\s\*&<>:]+)\s+(?<className>[\w:]+::)?(?<methodName>\w+)\s*\((?<params>.*?)\)\s*\{";
Regex regex = new Regex(pattern, RegexOptions.Multiline | RegexOptions.ExplicitCapture);
MatchCollection matches = regex.Matches(cppContent);
foreach (Match match in matches)
{
string methodName = match.Groups["methodName"].Value;
if (methodName == "if") continue; // skip keywords
int bodyStartIdx = match.Index + match.Length - 1;
int bodyEndIdx = FindMatchingBrace(cppContent, bodyStartIdx);
string result = cppContent.Substring(match.Index, bodyEndIdx - match.Index + 1);
lines.Add(new ProgramBody { Key = methodName, Content = result });
}
// Extract SaveProduct + its submethods
var selectedMethod = lines.Find(x => x.Key == "PreSaveUpdate");
if (selectedMethod != null)
{
builder.Append(selectedMethod.Content);
string subPattern = @"\b(?<methodName>\w+)\s*\((?<arguments>[^)]*)\)";
MatchCollection subMatches = Regex.Matches(selectedMethod.Content, subPattern);
foreach (Match subMatch in subMatches)
{
string subMethodName = subMatch.Groups["methodName"].Value.Trim();
if (subMethodName == selectedMethod.Key || subMethodName == "if") continue;
var subItem = lines.Find(y => y.Key == subMethodName);
if (subItem != null)
{
subLines.Add(new ProgramBody
{
Key = subMethodName,
Content = subItem.Content
});
}
}
foreach (var line in subLines)
{
builder.Append(line.Content);
}
}
Console.WriteLine("Extracting Methods Completed..........");
return builder.ToString();
}
// Finds the matching closing brace
private static int FindMatchingBrace(string source, int openingBraceIdx)
{
if (source[openingBraceIdx] != '{')
throw new ArgumentException("openingBraceIdx must point at a '{' character.");
int depth = 0;
bool inSingleLineComment = false;
bool inMultiLineComment = false;
bool inString = false;
bool inChar = false;
bool escaped = false;
for (int i = openingBraceIdx; i < source.Length; i++)
{
char c = source[i];
if (escaped) { escaped = false; continue; }
if (c == '\\' && (inString || inChar)) { escaped = true; continue; }
if (inSingleLineComment) { if (c == '\n') inSingleLineComment = false; continue; }
if (inMultiLineComment) { if (c == '*' && i + 1 < source.Length && source[i + 1] == '/') { inMultiLineComment = false; i++; } continue; }
if (inString) { if (c == '"') inString = false; continue; }
if (inChar) { if (c == '\'') inChar = false; continue; }
if (c == '/' && i + 1 < source.Length)
{
char next = source[i + 1];
if (next == '/') { inSingleLineComment = true; i++; continue; }
if (next == '*') { inMultiLineComment = true; i++; continue; }
}
if (c == '"') { inString = true; continue; }
if (c == '\'') { inChar = true; continue; }
if (c == '{') depth++;
else if (c == '}')
{
depth--;
if (depth == 0) return i;
}
}
throw new InvalidOperationException("Unbalanced braces detected.");
}
}
Sample Input — Dummy.cpp
Here’s a test C++ file containing various methods, including comments and tricky braces inside strings:
#include <iostream>
#include <string>
// A helper method with braces inside strings and comments
int helper1(int x)
{
// This brace } in a comment should be ignored
std::string s = "example with brace } inside string";
if (x > 0) {
return x + 1;
}
return 0;
}
/* Multiline comment with { and } that should be ignored */
void helper2()
{
for (int i = 0; i < 3; ++i)
{
std::cout << "loop " << i << std::endl;
}
}
// The main method we want to extract
bool SaveProduct(int id)
{
std::cout << "SaveProduct called" << std::endl;
int r = helper1(id);
if (r > 0)
{
helper2();
return true;
}
return false;
}
// Another unrelated method
double unrelated(double a, double b)
{
return a * b;
}
Expected Extracted Output
When the extractor runs, it captures SaveProduct
along with its submethods helper1
and helper2
:
bool SaveProduct(int id)
{
std::cout << "SaveProduct called" << std::endl;
int r = helper1(id);
if (r > 0)
{
helper2();
return true;
}
return false;
}
int helper1(int x)
{
// This brace } in a comment should be ignored
std::string s = "example with brace } inside string";
if (x > 0) {
return x + 1;
}
return 0;
}
void helper2()
{
for (int i = 0; i < 3; ++i)
{
std::cout << "loop " << i << std::endl;
}
}