You should be automatically redirected . If not, visit
http://newlisper.wordpress.com
and update your bookmarks.

12/07/2010

Balance those parentheses

I called this site "unbalanced parentheses" not just because the phrase had a Lisp-oriented slant but also because it hinted at a lightweight, off-the-wall attitude to the subject, rather than a serious analytical perspective. I spent all of 10 seconds thinking of the name while staring at a registration form with a mind suddenly gone blank, and it probably wasn't the perfect choice.

Looking at the logs recently, I noticed that some visitors actually appear to be looking for help balancing their unbalanced parentheses! I suspect, though, that they're not newLISPers, and probably not even Lisp users (who presumably don't need much help in that department anyway, and have their own opinions about editing code - see Greg's post for an interesting discussion). It occurred to me to try and write something about the subject. And it's high time I wrote something - anything - for this site.

The parenthesis is normally employed in written language to indicate something less than essential or subordinate - material that can be omitted without major damage to the sentence. In mathematics and programming, though, parentheses are more likely to be used for organizing selected items into groups or larger entities, and also for controlling the order of evaluation. Like legs, or trousers, they nearly always go around in pairs, so that a left, opening, parenthesis should always be accompanied by a matching right, closing, parenthesis following not far behind.

You know all this, of course.

The problems start when you start using lots of pairs of parentheses. Take this piece of simple mathematics:

(a * (b + ((c * d) / e) - f)

As every schoolboy and schoolgirl knows, to check parentheses in your algebra homework, you read from left to right and count out loud, going up one whenever you see an opening parenthesis, and down one for each closing one. Thus the counting for the above expression would sound like this:

"1 a * 2 b + 3 4 c * d 3 / e 2 - f 1"

and, since you started at 0 but ended on 1, you can conclude that your parentheses are unbalanced. Too many lefts, not enough rights, in this case.

Like me, you can probably do a simple example like this without counting, just by looking at it and mentally matching lefts and rights. But you don't want to have to do this for long Lisp expressions, or expressions that span a number of lines. Your preferred text editor should help you match the parentheses - each editor offers a few great tools, but I've found that few offer everything you want.

A little newLISP script can do the job automatically:

(set 'source-code-list (explode (read-file (nth 2 (main-args)))))
(set 'nest -1)
(dolist (i source-code-list)
  (cond
     ((= i "(")     (inc nest)
                    (print "\n" (dup "  " nest) i))
     ((= i ")")     (dec nest)
                    (print i "\n" (dup "  " nest)))
     ((= i "\n")    (print ""))
     (true          (print i))))

which, when given a piece of source code like this:

(define (mandelbrot)
    (for (y -2 2 0.02)
        (for (x -2 2 0.02)
            (inc counter)
            (set 'z (complex x y) 'c 126 'a z)
            (while (and
                     (< (abs (:rad (set 'z (:add (:mul z z) a)))) 2)
                     (> (dec c) 32)))
            (print (char c)))
        (println)))

outputs something like this:

(define 
  (mandelbrot)

  (for 
    (y -2 2 0.02)

    (for 
      (x -2 2 0.02)

      (inc counter)

      (set 'z 
        (complex x y)
       'c 126 'a z)

      (while 
        (and                    
          (< 
            (abs 
              (:rad 
                (set 'z 
                  (:add 
                    (:mul z z)
                   a)
                )
              )
            )
           2)

          (> 
            (dec c)
           32)
        )
      )

      (print 
        (char c)
      )
    )

    (println)
  )
)

It's a curious and spacious layout not to everyone's taste, but the parentheses line up vertically, so it's easy to see what's going on. You can get even closer to the simple counting idea by replacing the parentheses with snazzy Unicode symbols, producing results like this (which won't look right if your browser isn't Unicode-friendly):

➀define 
  ➁mandelbrot❷

  ➁for 
    ➂y -2 2 0.02❸

    ➂for 
      ➃x -2 2 0.02❹

      ➃inc counter❹

      ➃set 'z 
        ➄complex x y❺ 'c 126 'a z❹

      ➃while 
        ➄and

          ➅< 
            ➆abs 
              ➇:rad 
                ➈set 'z 
                  ➉:add 
                    ➊:mul z z➀ a❿❾❽❼ 2❻

          ➅> 
            ➆dec c❼ 32❻❺❹

      ➃print 
        ➄char c❺❹❸

    ➂println❸❷❶

using code like this:

(set 'level 0)

(define (open-list)
  (print "\n" 
    (dup "  " level) 
    (char (+ (int (append "0x" (string 2780)) 0 16) level)))
  (inc level))

(define (close-list)      
  (dec level)      
  (print (char (+ (int (append "0x" (string 2776)) 0 16) level))))

(dolist (c source-code-list)
    (cond
        ((= c "(")  (open-list))
        ((= c ")")  (close-list))
        (true       (print c))))

This technique is useful for analysing big SXML lists, too.

All this is fun, but you might be thinking that there's a much quicker way to simply find out whether the parentheses are balanced. Why not just count them?

(define (parenthesis-count txt)
     (count '("(" ")") (explode txt)))

(if (apply = (set 'r (parenthesis-count test-code)))
             (println "good code! " r)
             (println "bad code!  " r))

which returns something like:

bad code!  (22 23)
; or
good code! (22 22)

The world, or at least your source code, might well be harmonious and balanced when there are equivalent numbers of left and right parentheses.

But... you're too smart to be easily fooled by this glibness. You know better than I do that none of the code written thus far will operate perfectly on itself, let alone on the sort of code that crazy newLISPers can come up with. Obviously, the parentheses inside the strings are going to upset the counting. And when source code is formatted with comments and documentation markup, it's unlikely that these simple tricks are going to give accurate results. I'll have to analyze source code more carefully.

In my view, one of the few features that newLISP currently lacks is the ability to read its own code into its own nested list format. The powerful list referencing functions such as ref-all and set-ref are unable to operate on source code stored in strongly hierarchical lists or S-expressions, simply because there's no obvious way to convert source to that form. I use a temporary work round, in the form of Nestor (a utility that you'll find mentioned on these pages, although the code is unlikely to work on more recent versions on newLISP). Nestor also adds the colours to the source listings you see here, too, by converting raw code into an intermediate form that can then be converted to HTML with lots of CSS SPAN tags to colourize parenthesis-enclosed strings.

Here's what a piece of source code looks like after Nestor's manipulations:

((("open-paren" "(") 
  ("symbol" "define") 
  (("open-paren" "(") 
   ("symbol" "mandelbrot") 
   ("close-paren" ")")) 
  (("open-paren" "(") 
   ("symbol" "for") 
   (("open-paren" "(") 
    ("symbol" "y") 
    ("integer" -2) 
    ("integer" 2) 
    ("float" "0.02") 
    ("close-paren" ")")) 
   ; ...

Now I can ask for all opening parentheses and obtain accurate references to them. I've put the tokenized and deformatted source in `s-list':

(set 'op-refs (ref-all '("open-paren"  "(") s-list))
(set 'cp-refs (ref-all '("close-paren" ")") s-list))

op-refs
;-> 
((0 0) 
 (0 2 0) 
 (0 3 0) 
 (0 3 2 0) 
 (0 3 3 0) 
 (0 3 3 2 0) 
 (0 3 3 3 0) 
 (0 3 3 4 0)
 (0 3 3 4 4 0)
 ;...

cp-refs
;->
((0 2 2) 
 (0 3 2 5) 
 (0 3 3 2 5) 
 (0 3 3 3 3) 
 (0 3 3 4 4 4) 
 (0 3 3 4 11) 
 ;...

The parentheses can now be counted simply and with more confidence, knowing that strings and comments are not going to give false positives:

(length op-refs)
;-> 22
(length cp-refs)
;-> 22

And now it's possible to see each expression separately. I can work through every reference to an open parenthesis, and for each one, chop the end of the address off to get a reference to the expression as a whole:

(dolist (a-ref (sort op-refs (fn (x y) (> (length x) (length y)))))
    (output-code (s-list (chop a-ref))))

;->
(:mul z z)
(:add (:mul z z) a)
(set 'z (:add (:mul z z) a))
(:rad (set 'z (:add (:mul z z) a)))
(abs (:rad (set 'z (:add (:mul z z) a))))
; ...

This particular code sorts the expressions according to their depth (the most deeply nested first) and then displays them. It's kind of like how newLISP evaluates expressions, I fancy. Each of these expressions could be checked for unbalanced parentheses or dubious syntax, too.

Alternatively, it's possible to examine particular constructions occurring in code. For example, I could look at each use of `set', to check for dodgy assignations:

(set 'setrefs (ref-all '("symbol" "set") s-list))
(dolist (setref setrefs)
   (output-code (s-list (chop setref))))

;->
(set 'z (complex x y) 'c 126 'a z)
(set 'z (:add ( :mul z z) a))

I can also chain these types of queries together, or look for one inside another:

(set 'references-to-while 
    (ref '("symbol" "while") s-list match))
(set 'references-to-set 
    (ref '("symbol" "set") (s-list (chop references-to-while)) match))
(output-code 
    (nth (chop (append (chop references-to-while) references-to-set)) s-list))

;->
(set 'z (:add (:mul z z ) a))

However, I think I'm now barking up the wrong tree. It's relatively easy to find out whether your parentheses are balanced - but it can be much harder to notice when one or more of them are in the wrong position. This is, perhaps, a paradoxical result of Lisp's powerful yet simple syntax. Here's a typical example of what I mean:

(define (test a b)
   (let ((c a)
         (d b)
         (result 0))
     (set 'result (+ c d)))
   result)

(test 2 2)

;-> 
nil

I was hoping to see 4, but I see nil instead. The parentheses are balanced, and the syntax is correct. But one of the parentheses is in the wrong place, and I doubt whether any script or tool could easily identify which one or tell you where it should be located. (Perhaps the mistake is too stupid for that!) Can you see the mistake? And can you imagine a tool or editor that could detect or prevent it happening?

02/03/2010

newLISP tackles global warming

At Armagh Observatory, they've been keeping records of the weather since the late 18th century, and detailed records since 1843. The datasets, which are freely available to all, via their web site, are considered to be high quality and very useful, suffering from few of the problems that bedevil other sets of weather observations. There are hardly any problems with gaps in the records (which would have to be filled in by a technique that scientists call 'sparse data infill' and which I would call 'making stuff up'). There are no nearby airport runways or industrial complexes that could upset the microclimates. And, because the information is published openly, there's little chance that modifications or corrections can be applied without anyone noticing.

Armagh itself is a smallish town in Northern Ireland, less than 1000 miles south of the Arctic Circle although bathed in the warm currents of the Gulf stream. The name is familiar to most UK residents mainly as the place where, during the 1970s and 1980s, it was possible to witness gun, bomb, and grenade attacks in the streets, symptoms of the long-running conflict between Catholic and Protestant extremists. This was the time of "the Troubles", as they became known.

I decided to attempt some simple analysis of one of the sets of weather records that the Armagh Observatory has posted on its website, using newLISP as my magnifying glass and explorer's machete.

I started at http://climate.arm.ac.uk/calibrated/airtemp/index.html, and downloaded the 'corrected daily maximum temperature' file.

(set 'source-file 
    (get-url {http://badc.nerc.ac.uk/browse/badc/armagh/data/air_temperature/corrected_daily_max_temp/tccmax1844-2004.txt}))

I parsed this into separate lines.

(set 'raw-data (parse source-file "\n" 0))

Looking at the result, there are three different types of line to process. After the initial comment lines, there are either year indicators or space-separated lists of values (in degrees Centigrade) for every month for a particular day. For example, the line starting with "1" contains the maximum recorded temperatures for the first of January, February, March, etc. up to and including the first of December. Lines starting with 29 to 31 contain a few odd-looking entries such as "-999" that indicate that there was no such day for certain months.

("Daily maxmimum temperature at Armagh Observatory compiled and calibrated by John Butler and" 
 "Ana Garcia Suarez and Alan Coughlin, Armagh Observatory, August 2003 " 
 "Reference: Meteorological Data recorded at Armagh Observatory: Volume II - Daily, Monthly and" 

" 1844" 
"     1    1.9    6.2    6.8   16.3   19.2   18.7   17.7   16.7   24.8   15.6   10.3    6.6" 
"     2   -0.2    5.1    6.4   12.0   22.7   16.7   16.7   18.8   23.4   16.1    9.2    6.0" 

"    31    5.3 -999.0   13.3 -999.0   20.0 -999.0   15.3   22.3 -999.0   12.6 -999.0    6.3" ...)

To parse the raw data I used the following code:

(let ((year 0) (values '()) (monthly '()))
    (dolist (line raw-data)
      (cond 
        ((and (< (length line) 6) 
                        (>= (int line 0 10) 1844) 
                        (<= (int line 0 10) 2004))
        ; a year start?
            (set 'year (int line 0 10)))
        ; a data row?
        ((nil? (find "[A-Za-z]" line 0))
            (set 'values (map float (parse line)))
            (when values
               (set 'day (first values))
               (push (rest values) monthly -1)
               ; after 31, start a new month
               (when (= day 31) 
                   (push (cons year (transpose monthly)) yearly -1)
                   (set 'monthly nil)))))))

As usual, my code is a quick hack; ungraceful yet yielding a practical solution. It runs just once, because the resulting data are collected in the yearly list, which is then more efficiently accessed when saved in a file and reloaded as needed:

(save {/Users/me/armagh-data.lsp} 'yearly)

; later: 

(load {/Users/me/armagh-data.lsp} 'yearly)

The data list yearly holds all the data in a simple hierarchical list structure. I used the transpose function to 'flip' the monthly data lists as they were being processed, thus converting the unusual "first day of every month" order into what seemed a more reasonable chronological "month by month" sequence. The data is now in the following form:

(
    (1844 
      (1.9 -0.2 6.7 11.1 11.7 8.9 6.1 6.9 10.3 8.2 8.9  ...
      (6.2 5.1 4.8 6.2 6.2 6.2 4.5 4.5 5.2 4.8 6.7 8.4  ...
      (6.8 6.4 8.2 6.6 3.4 5.7 7.9 11.5 10.4 9.3 8.7 6. ...
      (16.3 12 11.2 10.8 10.7 12.1 14 14.6 17.9 15.7 13 ...
      (19.2 22.7 16.8 16 19.9 17.9 17.1 16.9 15.4 14.4  ...
      (18.7 16.7 20.1 20.8 18.6 20.2 19.1 20.1 19.1 18  ...
      (17.7 16.7 19.2 19.3 16.9 16.8 19.2 19.5 18.6 18. ...
      (16.7 18.8 17.2 18.4 19.9 15.3 16.4 16.8 16.8 18. ...
      (24.8 23.4 20.9 21.6 21.4 20.9 20.4 16.9 18.2 15. ...
      (15.6 16.1 17.3 15.1 13.4 13.3 11.8 14.8 14.9 15. ...
      (10.3 9.2 9.4 8.6 9.4 8.3 8.3 9.4 10 9.4 7.3 8.9  ...
      (6.6 6 6.3 5.6 4.2 5.4 6 2.4 3.5 3.7 3.8 3.2 2.1  ...
    (1845 
      (5 5.8 6.7 9.7 10.2 8.3 6.7 7.8 9.4 10.6 6.7 6.1  ...
      (2.6 6.7 9.2 5.9 7.5 4.5 1.7 4.5 7.5 7.3 5.3 9.5  ...
      (9 9 9.2 6.5 4 2.9 5.5 7.9 8.6 9.7 5.7 4.1 1.8 1. ...
      (15.4 12.9 15.9 15.4 9.6 13.4 15.1 11.2 10.1 9.6  ...
      (16 14.4 12.7 12.7 12.7 10.4 10.4 11.8 12.4 13.9  ...
      (19.9 19.7 14.6 15.8 16.3 16 16.1 16 16.6 18 24.2 ...
      (16.1 18.8 14.7 17.9 17.7 18.1 19.6 18.3 16.4 18. ...
      (17.7 17.3 17.8 18.9 19.2 18.7 18.9 19.5 16.7 18. ...
      (17.6 18.8 18.7 18.2 17.2 15.1 16.5 19.1 20.6 18. ...
      (13.6 12.9 13.4 10.4 11.3 13.4 11.8 12 12.3 12.3  ...
      (12.8 11.9 12.6 11.7 13.9 15.1 11.7 10.9 9.7 11 1 ...
      (8.2 5.7 2.6 7.9 6.6 6 7.1 9.9 8.7 10.4 6.6 5.1 6 ...
    (1846 

and so on.

To access the data for a specific year, I can use ref:

(define (data-for-year y)
   (let ((year-ref (ref y yearly)))
       (rest (yearly (chop year-ref)))))

such that:

(data-for-year 1845)

returns a list of 12 lists. And to access data for a specific month, I can use this:

(define (data-for-month month-number year-data)
  ; month-number:  Jan = 1, not Jan = 0 :)
   (clean (fn (n) (< n -30)) (year-data (- month-number 1))))

So I can, for example, find data for February 1900 like this:

(data-for-month 2 (data-for-year 1900))
    ;->
(3.4 3.3 2.8 4.4 3.4 3.3 3.9 1.6 4.9 1.1 2.8 2.4 4.1 4.4 7.9 7.7 7.2 6.3 3.8 6.2 5.7 9.9 11.7 12.8 9.3 6.8 5.1 5.6)

The clean function removes those spurious -999s for February 29, February 30, and February 31.

Another simple function:

 (define (average lst)
  (div (apply add lst) (length lst)))

calculates averages, so I can ask for the average (maximum) temperature:

(average (data-for-month 2 (data-for-year 1900)))
 ;-> 5.421428571

To find years with high average maximum temperatures, I can run the following code:

(for (year 1844 2004)
     (set 'year-data (data-for-year year))
     (set 'month-average nil)
     (for (month 1 12)
          (push (average (data-for-month month year-data)) month-average))
     (push (list year (average month-average)) year-average))

(sort year-average (fn (a b) (> (last a) (last b))))

with the following results:

((1959 14.20715118) 
 (1846 14.12936444) 
 (1949 14.11140297) 
 (1921 13.9449232) 
 (1857 13.88233359) 
 (1945 13.87162442) 
 (2003 13.84484575)
 ...

and so on. If you google the year 1846, you'll find references to the typhoid outbreak that followed the long hot summer, which claimed upwards of 30 000 lives. But that summer of 1959 has been described as a 'benchmark' summer; that special kind of warm and dry golden summer that ageing Brits like to look back on nostalgically when faced with another of the wet and miserable days that they usually see so many of in July and August.

A picture is worth a 1000 words

The plain numeric data is interesting, but I like exploring using visual techniques too.

My first attempts at visualizing the data used the HTML5 Canvas, but for this task I ended up switching back to PostScript. Both have their advantages: Canvas can do semi-transparency, but PostScript can't; PostScript can produce PDFs which are high resolution, and easily converted to other formats. Bear with me for a moment while I define a short library of PostScript functions.

(define (ps str)
  (write-line *buffer* str))

(define (render filename)
  (let ((fname (string (env "HOME") "/Desktop/" filename)))
     (write-file fname 
                (string "%!PS-Adobe-3.1" 
                        "\n" 
                        "%%Creator: newLISP " 
                        (sys-info -2) 
                        "\n" 
                        *buffer* 
                        "%%showpage" "\n"))
     ; on MacOS X, this runs the PostScript to PDF converter 
     ; and opens the resulting PDF file in Preview
     (exec (string "open " fname))
     ; on other platforms, the PostScript file needs to be 
     ;converted with, eg, GhostScript...
     ))

(define (ps-prolog)
  (ps (string {%} (date)))
  ; hack for big paper size
  (ps (string {
%%BeginFeature: *PageSize Default
<< /PageSize [ 5000 500 ] /ImagingBBox null >> setpagedevice
%%EndFeature
  }))
  ; default font
  (ps {/Helvetica-Bold findfont 12 scalefont setfont})
  (ps {1 setlinewidth}))

(define (transform-x x)
   (mul x-scale (add x x-offset)))

(define (transform-y y)
   (mul y-scale (add y y-offset)))

(define (line-to x y)
  (ps (format "%f %f lineto\n" (transform-x x) (transform-y y))))

(define (move-to x y)
  (ps (format "%f %f moveto\n" (transform-x x) (transform-y y))))

(define (begin-path)
  (ps (format {newpath })))

(define (fill-path)
  (ps (format {fill })))

(define (stroke-path)
  (ps (format {stroke })))

(define (set-fill-colour r g b) 
  (ps (format {%f %f %f setrgbcolor} r g b)))

(define (set-stroke-colour r g b)
  (ps (format {%f %f %f setrgbcolor} r g b)))

(define (dot x y r)
  (begin-path)
  (ps (format "%f %f %f 0 360 arc " (transform-x x) (transform-y y) r))
  (fill-path))

(define (text x y d)
  (move-to x y)
  (ps (format {(%s) show} (string d))))

I usually just copy and paste these from script to script, adapting them when necessary to the task in hand. There's a hack in the ps-prolog function to force a very wide paper size. This graph is going to be huge - I'm going to attempt to plot every single data point on this graph.

For colours, I define a simple colour map that goes from red to blue:

(define (generate-colour-list)
    (for (i 0 1 0.001)
        (push (list i 0 (sub 1 i)) colour-map -1)))

and a function that takes a temperature value from between -10 and 35 (they're safely below and above the range of values in the dataset) and selects an entry in the colour map. (I shouldn't hard-wire all these values in, should I?!)

(define (colour-for-temp t)
   (colour-map (int (mul (length colour-map) (div (add 10 t) 45)))))

Now I'm ready to initialize:

(define (init)
  (set '*buffer* {})
  (generate-colour-list)
  (ps-prolog)
  (set  'x-offset 10
        'y-offset 25
        'x-scale 4
        'y-scale 6
        'x-step 0.02
        'start-year 1844
        'end-year 2004
        'graph-width 2000
        'graph-height 400))

and - finally - I'm ready to start drawing.

(init)

; legend
(set-fill-colour 0.0 0.0 0.0)
(text 10 44 {Maximum temperatures recorded at Armagh Observatory, Degrees Centigrade})
(text 10 42 {Monthly averages shown as grey circles})
(text 10 40 {Daily maximum temperatures marked as small dots})
(text 10 38 {Records shown in red/blue})

(set-stroke-colour 0.7 0.7 0.7)

; x-axis
(begin-path)
(move-to 0 0)
(line-to graph-width 0)
(stroke-path)

; y-axis
(begin-path)
(move-to 0 -5)
(line-to 0 (- graph-height 30))
(stroke-path)

; y rules 
(for (y -5 35 5)
     (begin-path)
     (set-stroke-colour 0.9 0.9 0.9)
     (move-to 0 y)
     (line-to graph-width y)
     (stroke-path)
     (set-fill-colour 0 0 0)    
     (text -7 y y))

The way I've set this up, I'm actually adding the graph's 'furniture' by specifying vertical coordinates in degrees Celsius! It's weird, but it kind of makes sense to say "I want this line to appear at the -5° line".

The graph is drawn in two passes, partly because there's no other way to get layering. In the first pass, I'm drawing grey circles to indicate the average maximum temperature for each month. In the second pass, I go back and plot each day's value as a dot:

; first pass; draw monthly average maximums as grey dots
(set 'x-pos 0)
(for (year start-year end-year)
      (set 'year-data (data-for-year year))        
      (for (month 1 12)
           (set-fill-colour 0.7 0.7 0.7)
           (set 'monthly-average (average 
                (data-for-month month year-data)))            
           (set 'y-pos monthly-average)
           (dot  x-pos monthly-average 3)
           (inc x-pos (mul x-step (length 
                (data-for-month month year-data)))))
      (inc x-pos x-step))

; second pass; draw daily maximums
(ps {/Helvetica-Bold findfont 15 scalefont setfont})
(set 'x-pos 0 'hottest-day-so-far 25 'coldest-day-so-far 0)
(for (year start-year end-year)
    ; year markings
    (when (= 0 (% year 5)) 
       ; sometimes, 5 yearly ticks
       (set-stroke-colour 0.8 0.8 0.8)
       (begin-path)
       (move-to x-pos -10)
       (line-to x-pos -12)
       (stroke-path)
       ; year: black
       (set-fill-colour 0 0 0)
       (text x-pos -14 year))

    ; always, 1 year ticks
    (set-stroke-colour 0.8 0.8 0.8)
    (begin-path)
    (move-to x-pos -10)
    (line-to x-pos -11)
    (stroke-path)

    ; get data for year
    (set 'year-data (data-for-year year))        
    (for (month 1 12)

         ; every day
         (dolist (daily-temp (data-for-month month year-data))
            (set 'hottest-day-so-far (max hottest-day-so-far daily-temp))
            (set 'coldest-day-so-far (min coldest-day-so-far daily-temp))
            ; lower alpha for this set
            (cond 
                ((= hottest-day-so-far daily-temp) 
                      (set-fill-colour 1 0 0)
                      (dot x-pos daily-temp 3))
                ((= coldest-day-so-far daily-temp)
                      (set-fill-colour 0 0 1)
                      (dot x-pos daily-temp 3))
                (true
                      (apply set-fill-colour (colour-for-temp daily-temp))
                      (dot x-pos daily-temp 0.5)
                      ))
            (inc x-pos x-step)))
    ; next year
    (inc x-pos x-step))

; finish drawing
(ps {showpage})
(render "graph.ps")

The result is a PostScript image that is easily converted to a PDF image by the Mac's pstopdf program (or by GhostScript or Adobe Distiller). The result is too wide for publication in a typical journal, and strains at the edges of typical web pages. But it's easy to explore the PDF image using Preview or Acrobat Reader.

Armagh max temperature PNG

The PDF is here.

One thing I noticed (apart from the apparent lack of much significant change during the period) is that those averages seem oddly unrepresentative of the actual daily weather. There's no disputing the arithmetic (I hope). And yet, for example, look at the summers of 1870 or 1995: daily temperatures look hot, with many above 25°, but the monthly averages appear cooler.

Every picture tells a story

This type of graphic visualization of a dataset is designed to tell a story visually, the story being, presumably, the one told by the surrounding text. However, it's possible for the author to make the storytelling even more effective by tweaking aspects of the presentation. For example, authors can change or move the axes, adjust the horizontal or vertical scales, introduce colours to influence perception, combine two or more datasets as if they were one, start or stop the graphing at a more 'telling' point in the data, or even combine two or more graphs on a single display to imply connections. For more on this fascinating subject, see the books and writings of Edward Tufte.

I don't have a scientific story to tell, here. Rather, I'm telling a meta-story. I've made a number of small mistakes and inappropriate design decisions in this post (some deliberate, or at least, some I'm aware of, others are accidental). But, given the published and freely downloadable weather data, the code listed here, and - of course - the excellent free and open source newLISP language, it should be possible for anyone to retrace my steps, find my mistakes, and present a more credible or compelling view of the same dataset. I like to think that, if I was a scientist, I'd be happy for that to happen. But how many of us are either willing or able to do the same forensic work for other - and probably much more important - graphical visualizations of specific datasets? Currently the world of climate science seems to be in disarray, with allegations of data tampering, fraud, and much else besides. It appears that we should all be a lot more critical of the way information is presented to us.

Addendum

Thanks to Kazimir, this post was mentioned at Hacker News - and this site had 8000+ more page views than usual that day... :) I was pleasantly surprised that quite a few of the commenters seemed to have understood that this wasn't literally an attempt at serious scientific analysis, despite the teasing headline. It was merely an observation or two from a non-scientific layman about the process of drawing pictures and conclusions from sets of numbers, while at the same time exploring some simple graph drawing using newLISP (rather than proprietary software packages such as Excel or Mathematica).

I'm intrigued by the heavy data compression that's going on in much of the graphs I see on the net. A huge amount of juicy data is squeezed until dry enough to be summarized by a single wavy line. In this particular example I tried to avoid dropping any data points, but the resulting graph is, I freely admit, too hard to 'read'. When you're far enough away to see it, you can't see anything worth seeing.

Kazimir also suggested that I add yearly averages. Easy enough to do, although I'm reluctant to cloud the graph with even more information:

Armagh max temperature PNG

And about this 'average' thing: talking about the average temperature for 'a decade' begs the question of why starting with a year ending with 0 is any better than starting with any other number. For example, why not compare the decade 1853-1862 with 1927-1936? And for that matter, why are we using intervals of 10 years, rather than 5 or 14?

But there's an easy way to answer this particular conundrum:

(for (x 3 20)
   (for (y 0 (- (length yearly-averages) 1))     
        (set 's (slice yearly-averages y x))     
        (if (= (length s) x)
            (push (list (first s) (average (map last s))) results))) 
    (println x { } (first  (sort results (fn (a b) (> (last a) (last b))))))
    (set 'results nil))

    3 (2002 13.74500614)
    4 (2001 13.60712154)
    5 (2000 13.58417734)
    6 (1999 13.56492782)
    7 (1998 13.51859569)
    8 (1997 13.53881651)
    9 (1995 13.44358625)
    10 (1995 13.47582925)
    11 (1994 13.41562109)
    12 (1993 13.33442216)
    13 (1992 13.29284527)
    14 (1846 13.32015829)
    15 (1845 13.2821639)
    16 (1989 13.29859146)
    17 (1988 13.28216522)
    18 (1987 13.24236811)
    19 (1986 13.17982611)
    20 (1985 13.13376373)

which tells us that, for this data set at least, the warmest periods between 3 and 13 years long, and between 16 and 20 years long, were all in the late 1980s and 1990s, but the warmest 14 and 15 year long periods were in the 1840s and 1850s. Change the sorting function to <, and the coldest (or, to be more precise, the least warm, on average) periods were all in the late 1890s. Tentatively, I could suggest that - at Armagh - there was a slight dip in average temperatures towards the turn of the 20th century. and that average temperatures now are equalling and surpassing the early Victorian readings by a tenth of a degree or so. But I said I was trying not to do any science here, so that's enough of that.