Programming Language Concepts

Note

These slides are also available in PDF format: 4:3 PDF, 16:9 PDF, 16:10 PDF.

Learning Group Activity

With your learning group:

  1. Share your definitions of a programming language. Discuss what each definition entails, and whether you think it could be improved.
  2. Create one collective definition.

Got a good one? There will be a chance for you to share with the class.

Two Parts to a Language

Design vs. Implementation

  • Design: The language specification; that is, given an input, how should the implementation behave?
  • Implementation: A compiler or interpreter which behaves according to the language design.

Example

C is a programming language, but GCC is an implementation of the C programming language.

Discuss

  1. Someone tells you a language is fast. Are they referring to the design, or the implementation? Or both? Why?
  2. Is it always correct to separate the concepts of design and implementation? When can we get into murky water?

Abstract Syntax Trees

Abstract Syntax Trees

Consider this simple mathematical expression:

\[(2 + 3) \times 4\]

We could convert this to a tree structure that represents the same expression:

../_images/math-ast.svg

This kind of tree structure which represents the syntax of an expression is called an Abstract Syntax Tree (AST).

Symbolic Expressions

Since drawing abstract syntax trees is a lot of work, there exists a notation called symbolic expressions (or s-expressions) that makes it a little easier.

../_images/math-ast.svg

This converts to the following s-expression:

(* (+ 2 3) 4)

Exercise

With your learning group, for each math expression, draw an abstract syntax tree, and write out the resulting s-expression.

  1. \(6 \times 7 + 8\)
  2. \(\dfrac{1}{2 \times 3 \times 4}\)
  3. \(2^3 + 5\)

Evaluating an AST

To get the result from abstract syntax tree, we could write a simple program to do this:

procedure eval(node):
    if node is a literal then return node
    otherwise, if node.operator is...
        +, then:
            sum <- 0
            for each child in node:
                sum <- sum + eval(child)
            return sum
        *, then:
            ...

A program which does this is called an interpreter. We’ll present a more formal definition of this in a few more slides.

Compiling an AST

You could imagine a program which takes in an AST and creates machine code (pseudocode omitted):

                      ADD     R1 <- 2, 3
(* (+ 2 3) 4)   ->    MUL     R1 <- R1, 4
                      RETURN  R1

This kind of a program is called a compiler. Again, formal definition coming soon.

Language Implementation Techniques

Compiled Languages

Compiler:A computer program which translates a high-level language (such as C) into machine code.
Machine Code:A set of instructions which can be directly executed by a CPU.

Typically, when we speak of a compiled language, we refer to one which can be compiled to machine code. Compilers which translate to virtual machine bytecode (e.g., Java and Python) are better categorized as interpreted languages.

Compiler Implementations

Advantages:

  • Runtime is fast!

Disadvantages:

  • Compile time is slow
  • Source code cannot be a part of the input data

Examples

C, C++, and FORTRAN are generally implemented as compiled languages

../_images/compiler.png

Interpreted Languages

Interpreter:A computer program which reads a high-level programming language and directly executes the instructions of the language itself.

An interpreted language is a language designed to be implemented using an interpreter.

Discuss

What could be the advantages of executing the language directly? Disadvantages?

Try and come up with an example of something that could be done with an interpreter model but not a compiler model.

Interpreter Implementations

Advantages:

  • No need to compile
  • Source code can be a part of input data: you can transmit functions across the network to be run!

Disadvantages:

  • Runtime is slow

Examples

BASIC, PHP, and Perl are generally implemented as interpreted languages

../_images/interpreter.svg

Model of a classic interpreter. Modern interpreters are slightly more complicated.

Hybrid Interpreters

To speed up the execution of interpreted languages, implementers started getting clever:

  • Interpreted VM Bytecode: Input is lexed, parsed, then translated to bytecode. The bytecode gets optimized, then the low level bytecode is interpreted. Examples: CPython, OpenJDK (Java), Ruby MRI
  • Just In Time Compiler: Source code is compiled as it’s executed, putting machine code on the processor “just in time”. Examples: PyPy, LuaJIT, Chrome V8

Advantages include all the benefits of interpreted languages, with run times occasionally approaching compiled languages.

Typical Interpreter Structure

  1. Lexer: Source Code \(\to\) Tokens
  2. Parser: Tokens \(\to\) Abstract Syntax Tree (AST)
  3. Analyzer (optional): AST \(\to\) AST (optimized)
  4. Evaluator: AST + Context \(\to\) Result + Context

Cons Cells

Cons Cells: Building Blocks of PL

A cons cell (short for “construct”) is a data structure for which we can build many others from. It consists of two references to other objects.

../_images/conscell.svg
CAR:Contents of the address register
CDR:Contents of the decrement register

Both can be a reference (e.g., pointer) to anything.

Building Lists using Cons Cells

Suppose we want to represent a list using cons cells. We can take inspiration from linked lists:

  • CAR will be a reference to the list item.
  • CDR will be a reference to the next cell.
  • The last item in the list will have a CDR with the special value NIL.

For example, here is a cons cell diagram for the list (42 69 613):

../_images/conslist.svg

Building Trees using Cons Lists

Represent the following AST using cons cells:

(+ (/ 10 2) (* 3 3))

(either done as activity or example on board depending on time)

Quiz Announcement

Quiz 1 will be held Tuesday, September 4. Topics will be what we cover in class up until then.