You should be automatically redirected . If not, visit
http://newlisper.wordpress.com
and update your bookmarks.

30/01/2009

Simple database in newLISP

As I was updating the sqlite section of the Introduction to newLISP for version 10 last year, I wondered whether it would be easy to write a simple database interface using pure newLISP rather than SQL. This would be useful for those applications where a sqlite or other type of database might be difficult to install, implement, or just larger or more powerful than necessary.

Join me in the following thought experiment to see whether it would work. I'll use the periodic table example again, for this 'elementary' exploration.

We would start by loading the database code as a module:

(load "nldb.nl")

To make it easier to use (since you wouldn't have to keep supplying context prefixes), let's enter the module:

(context nldb)

Like many modern applications, let's assume there's an empty database already there by default, waiting to be filled. So the first real job would be to create a table. We would specify its name, and the names of the columns too:

(create-table 
  'elements 
  '(No AtomicWt Name Symbol MP BP Density EarthCrust DiscoveryYear Group IonEn))

Hopefully that would create a new table called elements, with the columns as shown. The next task would be to add some data. To add a row, you would expect to be able to do something like this:

(add-row 'elements '(1 1.0079 "Hydrogen" "H" -259 -253 0.09 0.14 1776 1 13.5984))

Although, in that form, the data has to be provided in full, in the right order. A refinement might be to be able to type this:

(new-row 'elements '((Name "Unobtainium") (Symbol "Ub")))    

where we can assume that any missing or unspecified columns are left undefined.

After using these and similar functions interactively, it would be useful to be able to see what's been done so far. Logically:

tables
;-> (elements)

should return a list of tables, and, to see the columns in a table:

(list-columns 'elements)
;-> (No AtomicWt Name Symbol MP BP Density EarthCrust DiscoveryYear Group IonEn)

To see the whole database, it would be useful to have a command like show:

(show)

for output such as this:

Contents of database db
 1 table: (elements)

 Table:  elements
 Columns (No AtomicWt Name Symbol MP BP Density EarthCrust DiscoveryYear Group IonEn)
 Rows:   
  (1 1.0079 "Hydrogen" "H" -259 -253 0.09 0.14 1776 1 13.5984)
  (2 4.0026 "Helium" "He" -272 -269 0 0 1895 18 24.5874)
  (3 6.941 "Lithium" "Li" 180 1347 0.53 0 1817 1 5.3917)
  ; and so on

When using a database, one of the primary tasks is to select rows in a table. So something like this would be a must-have function:

(select-rows 'elements)

which should return all rows. But you would also expect to be able to specify a filtering function, so that only certain rows would be selected. So, to find elements discovered since 1900:

(select-rows 'elements (where (>= 'DiscoveryYear 1900)))

which should return:

(((No 43) (AtomicWt 98) 
  (Name "Technetium") 
  (Symbol "Tc") 
  (MP 2200) 
  (BP 4877) 
  (Density 11.5) 
  (EarthCrust 0) 
  (DiscoveryYear 1937) 
  (Group 7) 
  (IonEn 7.28)) 
; and so on

and to find common elements (more than 5% of the earth's crust):

(select-rows 'elements (where (>= 'EarthCrust 5)))

which should return:

(((No 8) (AtomicWt 15.9994) 
  (Name "Oxygen") 
  (Symbol "O") 
  (MP -218) 
  (BP -183) 
  (Density 1.43) 
  (EarthCrust 46.71) 
  (DiscoveryYear 1774) 
  (Group 16) 
  (IonEn 13.6181)) 
; and so on

Ideally, the function used to select rows could be any newLISP function that returns true (select) or nil (don't select) based on the contents of each row's column:

(select-rows 'elements (where (find "ium" 'Name)))

or

(select-rows 'elements (where (starts-with 'Name "C" 0))) ; regular expression

and it should also be easy to specify more than one row-filter function:

(select-rows 'elements 
    (where (and (> 'EarthCrust 5))
                (< 'DiscoveryYear 1900)))

or do set intersections:

(intersect 
    (select-rows 'elements (where (> 'EarthCrust 5)))
    (select-rows 'elements (where (< 'DiscoveryYear 1900))))

It should be possible to display just a few columns of the results:

(select-columns '(Name EarthCrust) 
    (select-rows 'elements))

which should return:

(((Name "Hydrogen") (EarthCrust 0.14)) 
 ((Name "Helium") (EarthCrust 0)) 
 ((Name "Lithium") (EarthCrust 0)) 
 ((Name "Beryllium") (EarthCrust 0))
; and so on

and

(select-columns '(Name Symbol) 
    (select-rows 'elements (fn (row) (find "ium" (lookup 'Name row)))))

which should return:

(((Name "Helium") (Symbol "He")) 
 ((Name "Lithium") (Symbol "Li")) 
 ((Name "Beryllium") (Symbol "Be")) 
 ((Name "Sodium") (Symbol "Na")) 
; and so on

or even:

(map (curry lookup 'Name) 
    (select-rows 'elements (where (starts-with 'Name "A" 0))))

("Aluminum" "Argon" "Arsenic" "Antimony" "Astatine" "Actinium" "Americium")

Once this selection technique works, extending it to delete rows shouldn't be too difficult:

(delete-rows 'elements (where (> 'DiscoveryYear 1945)))

which should delete every element discovered after 1945.

So, how would we change the data in a row? It would make sense if you could pass a selection function to find some rows, and were then able to supply a column name and a modification function that could transform the data in those rows. Something like this, perhaps:

(change-rows 'elements (where (= 'Name "Hydrogen")) 'BP inc)

where the inc function would be applied to the contents of the BP (boiling point) column for all matching rows. The effect of this example would be to increase the value of Hydrogen's boiling point by 1 degree.

And the modification function should be, not just a newLISP primitive, but any function that modifies the current value:

(change-rows 'elements 
  (where (ends-with 'Name "ium")) 
  'DiscoveryYear 
  (fn (x) (setf x -2000)))

There, the anonymous function would set the value of x to -2000, and this function would be applied to the value of the column DiscoveryYear, for all matching (ie ending with "ium") elements.

A useful function to have would be a regular expression search. It could look like this:

(select-columns '(Name) 
  (find-text 'elements "pb" 1))

;-> (((Name "Lead")))

Here, a case-insensitive regex search for "pb", filtered to show only names, would return just a single element:

To sort a table, you'd want to specify the table, the column, and a function:

(sort-table 'elements 'BP >)

This returns the table, after sorting it by each element's boiling point. You should also be able to supply your own sorting function:

(sort-table 'elements 'Name (fn (x y) (< (length x) (length y))))
; returns the table, sorted with the shortest name first

To sort a row, it would be cool if you could apply a sort-rows function to a selection of columns:

(sort-rows 
  (select-columns '(Name Symbol) (select-rows 'elements)) 
  'Name 
  (fn (x y) (> (length x) (length y))))

;-> (((Name "Rutherfordium") (Symbol "Rf")) 
;     ((Name "Praseodymium") (Symbol "Pr")) 
;     ((Name "Protactinium") (Symbol "Pa")) 
; and so on     

This example is better, as the whole table isn't sorted, just the extracts are.

There's no reason why we shouldn't be able to add new columns whenever we like:

(add-columns 'elements '(Explosiveness) 0)

Finally, we'd obviously want to save the database:

(save-db "/Users/me/elements.nldb")

That would save just the tables, not the entire nldb context with its functions as well. This way, we could develop the two sides (code and data) separately, even though they both live together when they're active. Loading the database would be a two-stage process. First, load the database module:

(load "nldb.nl")

then load the database:

(load "/Users/me/elements.nldb")

So, what do you think? To me, this first sketch of the 'user interface' didn't look too bad. So I decided to have a go at writing a fully-functional version. In the early stages I was helped by some work done last year by kinghajj on the newLISP forum. I ended up using a bit of his code, but changing the model, making it a bit easier for me to understand. (If you're out there, kinghajj, thanks, and hope you're enjoying your current language!)

I'm going to use this simple database for storing the entries on this blog. Unfortunately I'll have to wait until Nearly Free Speech get round to updating their version of newLISP to version 10, since I've been using some 10-only constructs.

If you'd like to help test out the code, contact me!

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home