Title

Thursday, 5 February 2015

Parsing an OCaml file with Ocaml


I want to analysis OCaml files (.ml) using OCaml. I want to break the files into Abstract Syntax Trees for analysis. I have attempted to use camlp4 but have had no luck. Has anyone else successfully done this before? Is this the best way to parse an OCaml file?

Answer

(I assume you know basic parts of OCaml already: how to write OCaml code, how to link modules and libraries, how to write build scripts and so on. If you do not, learn them first.)

The best way is to use the genuine OCaml code parser used in OCaml compiler itself, since it is 100% compatible by definition.

CamlP4 also implements OCaml parser but it is slightly incompatible with the genuine parser and the parse tree is somewhat specialized for writing syntax extension: not very good for any other kind of analysis.

You may want to parse .ml files with syntax extensions using P4. Even in this case, you should stick to the genuine parser: you can desugar the source code by P4 then send the result to your analyzer with the genuine parser.

To use OCaml compiler's parser, the easiest approach is to use compiler-libs.common OCamlFind package. It contains the parser and type checker of OCaml compiler.

Start from modifying driver/compile.ml of OCaml compiler source, it implements the majour compilation phases: calling preprocessor, parse, typing then code generation. To parse .ml files you should modify (or simplify) Compile.implementation. For .mli files Compile.interface.

Good luck.

No comments:

Post a Comment