I feel like I am coming to the end of a long journey. Yesterday I managed to get another part of the puzzle working. I might actually be able to finish (hah!) this personal project.
I do not like WordPress. We call it the security breach. If it hasn’t happened today, it will happen tomorrow. So, I’m going to talk about the process.
The goal
To be able to easily insert citations and bibliographies into WordPress articles.
Requirements:
- It has to be Easy
- It has to look good in the final results.
- It has to be easy to create citations
Item One requires a Graphical User Interface. In other words, I need to be able to get to a citation with just a few clicks. It also means that I can’t be clicking on multiple windows or carefully copying, so I can paste links. It really needs to be simple.
The current user interface looks like this:
You have all seen Item Two in practice. There is still work to be done, but it is getting better. What you don’t see is the code that produces this.
—Lewis Carroll & John Tenniel, Through the looking-glass, and what Alice found there 123 (Macmillan Children’s Books 2. [Dr.]; Repr ed. 1997)[xCite item="MUA4SWX7" p=123 pos=1]
Bibliography
[xCiteBib]
Those things in the square brackets are called shortcodes. They are a WordPress method of inserting content programmatically into your posts and pages without having to worry about giving the writers the ability to break the site. They are simple to use and pretty neat.
Of course, they are on the way out. They are part of the old school “classic editor”. The new editor for WordPress is block oriented and allows for prettier things but gets in the way of writing content. As an example, the bibliography shortcode would be a “block” in the block editor. You would select that block and put it where you want.
The hardest part, to begin with, was Item Three. I started by attempting to get my primary source for legal documents to give me citations I could “just use”. That got me part of the way there, but not all the way. It also didn’t help with all the other things I wanted to cite.
I was writing code to help get more and more into the CourtListener website and over and over again it didn’t do everything I wanted it to do, nor was it reasonable to expect it to do so.
You’ll notice that the cite above is to a book. There is no reason for CourtListener to ever provide me with a way to cite books.
This leads me to bibliography tools. I located a recommended one called “Juris-M”. It is based on Zotero.
When I started trying to learn about Juris-M, I found at multiple law schools and pre-law schools. This looked like something to look into.
Browser Plugins/Add-ons
I’ve been using browser plugins and add-ons for a long time. I used to use “LastPass” but moved away from it because I didn’t like the free version and I no longer required it for client work. I currently use “Keeper”, which I can recommend. Furthermore, I paid for it. It can share passwords safely across Keeper Accounts.
I have my grammar checker up there now, “LanguageTool”. Again, I paid for it because I write articles with too many words.
But plugins are magic. They just work, until they don’t.
One of the plugins I use is “RECAP”. It is explicitly designed to interact with the government court’s electronic filling system called “PACER”. When you are on a PACER website, the RECAP plugin watches for you to download files. When you do, RECAP uploads those files to RECAP on CourtListener, which helps populate the site.
It failed. I learned a little bit about debugging plugins.
Which takes me right back to Zotero/Juris-M. Zotero is well maintained, with updates happening on regular intervals. Juris-M is about two years behind Zotero. Still compatible, but I’m not seeing much activity over the last two years.
Zotero/Juris-M has a thing they call a “connector”. This allows your browser to connect to a local or remote Zotero application. The connector analyzes the URL of the web page to discover the best “translator” to process the current page.
When you click on the icon for the connector, it will scrap the page and create a new entry in your Zotero application. This makes it easy to add items to your Zotero application. Go to the page you want, click the button, new item shows up, ready for you to use.
The translator for CourtListener didn’t work. It threw an error and refused to work. Which meant I had to learn how to work with Zotero Translators, which are a plugin for the Zotero connector plugin. Yeah, that’s fun.
It is written in JavaScript. JavaScript was designed to allow simple modifications of websites. I.e., to be able to do something to the Document Object Model upon an event firing.
It was not meant for heavy programmatic work. Which is why such things as this editor, written entirely in JavaScript, are so astonishing. Or Google Docs, or Google Sheets.
I figured out what was wrong. Took a wild guess, and fixed it. I now had a working plugin.
The Path to a citation
Now I’m able to visit a website, decide I want to reference the contents, click the Zotero button and a citation item shows up in my Zotero application.
I need a way to get from A to B.
The Zotero application has the ability to drag and drop a citation. You click on an item then drag it over to a text window, drop it and you have a pretty citation.
You can also select a set of citations and create a bibliography from those citations to drop into a text window.
This sort of works. First, the citation format isn’t exactly what I want. Second, the citations do not use the standard legal format for Id, Ibid, Supra, Post and Anti references. Those have to be done manually.
This means I need to create a citation style specifically for the blog that is also compatible with the BlueBook citation reference.
Which means I have to have my own style.
Citation Style Language(CSL)
As always, I start with a nearly working piece of code and start “fixing” it. In this case, I started with the jm-indigobook.csl style. This is a style specific to Juris-M. It creates a reference that matches the BlueBook standard but is not copyrighted back to any particular entity, so free to use as a reference.
The first thing that I wanted was something that had the correct HTML in the link itself.
An example citation looks something like this in HTML:
—<cite>Lewis Carroll & John Tenniel, <i>Through the looking-glass, and what Alice found there</i> (Macmillan Children’s Books 2. [Dr.]; Repr ed. 1997)</cite>
Which means I need to add the &mdash, the cite elements, and when there are URLs, I need to add an anchor element as well.
What does that mean? It means code like this:
<macro name="anchor-pre"> <choose> <if variable="URL"> <text value="<a href=""/> <text variable="URL" /> <text value="">" /> </if> </choose> </macro> <macro name="anchor-post"> <if variable="URL"> <text value="</a>" /> </if> </macro>
I understand XML, I know the syntax rules, what I don’t know is the elements and their meaning. So I learned.
Having created my own citation style sheet, verified that it works locally, it was time to attempt to make it work on WordPress. This failed at the time because my style was not known to Zotero. With no wish to fight the battle of getting a work in progress into Zotero, I looked for alternative solutions.
The Citation Server
In the process of trying to get things to work with Zotero and my CSL, I discovered that Zotero had a standalone citation server. Because this is an open-source project, I was able to hunt down that server and run it locally.
It didn’t work.
Docker, K8S and Node.js
The citation server I found was from Zotero, my citations depended on Juris-M. This meant that I had to create a version of the server using the latest Juris-M CSL parser and the working Zotero citation server.
The Zotero citation server is not public facing. This means that all access to it is restricted to coming directly from the Zotero web application. This means that it doesn’t really matter how clean the server code is.
While the Zotero website is built in PHP or Python, I don’t remember, the citation server is written in JavaScript. JavaScript sucks as a server language. I rather hate it.
The source that I found did not have actual working Docker files. I put that together, built custom docker images to allow all of this to work.
In the process of trying to make all of this work correctly, I discovered that Juris-M has more item types than Zotero and those item types have different fields. None of this is a big deal, you just have to change the different types as required.
This initially started as work in my WordPress plugin. This did not go well. There was just too much and it was spread all over the place.
In the end, I fell into the Node.js trap. Node.js has a reputation of expanding without end. I wanted to use 10 modules in total, at the end of my project. Just 10 modules. This generated a requirement for over 75 modules total. And I might need more still.
Somehow I found two modules that were extremely helpful. The first was a zotero-to-csl. Of course, it was years out of date, but it worked, sort of.
The code, as provided, had a requirement that the input be strictly formatted to match the CSL input data item. If your input didn’t exactly match what was expected, the converter would abort. Which sometimes caused the server to abort or lock up.
The issue was that the Zotero items had extra fields that the package didn’t know about. For example, Zotero now uses the word “key” to indicate the identifier for an item. Zotero-to-csl expects that to be “id”. The module fails if it sees key and there is no “id”. This had to be fixed.
Another issue was that there are more citation types in the current Zotero item and Juris-M item than when the converter was last updated.
I updated the software, then figured out how to have npm use my version instead of the Internet’s version.
This required more learning of JavaScript and Node and npm.
But I now have working docker images running in my K8S cluster.
The WordPress Plugin
Back at the home front, I needed to create an actual plugin. As is standard procedure, the example code for anything stopped just short of being useful. “Here is how you create a “Hello World” shortcode.”. Well, what now? How do I…?
I started creating xCite, my plugin. First, I had to learn how to use the WordPress ORM. It is not nearly as clean as Django, but it exists.
One of the most common tasks that you perform on a database is “UPDATE or INSERT”. This operation says “If this entity already exists, update the values to these new values. If the item does not exist, insert it into the database.”
Django does this automatically on the save() function. WordPress doesn’t have anything similar… Or does it.
I stumbled onto the magic function $wpdb->replace()
. According to the documentation, if the entity does not currently exist in the database, it is inserted. If the entity already exists, it is replaced with the give entity. The only real requirement is that the complete set of columns needs to be in the call, not just the ones that are being updated.
Except that WordPress doesn’t do an UPDATE or INSERT. It does a DELETE if EXISTS, INSERT. This works for many situations, not all. In my case, it screwed me over.
In the end, I got it all working. Mostly by rewriting the WordPress convince routines with my own. The $wpdb->replace()
calls were replaced with xCite_db_save()
which takes the same parameters as the replace function. It uses a database transaction to make it parallel safe. Checks to see if the entity exists, if it does exist, an update is performed. If it does not exist, the data is inserted. The transaction is the committed and life goes on.
This is an expensive operation, but it is working for me at this time.
I also decided it was better to have a local copy of the cited items rather than pull the items from the Zotero web application every time we needed something. This simplified some things, but required me to learn the Zotero API at all levels and create an actual syncing Zotero application.
Which seems to be working now.
Final actions
I’m not going to cover the GUI, I’m still working on it. Needless to say, there are complications, and I’m still working those complications out.
Nope, the final action is security.
As I said earlier, the citation server code that I originally started working from is designed to be protected from the mean world by not being accessible to the mean world. This means it has to be much more robust.
I’ve already put it behind an ingress with SSL involved. This means that all communications to the citation server are encrypted.
What I need is a method to track the load on the citation server and to control access to it. I decided to use HMAC. My first thought was to use a challenge/response method.
In challenge/response, the client connects to the server and says, “I AM JOHN”. The server then says back to the client, “Well John, prove it, encrypt this text with your secret.” The client says back, “Here is the cypher text of the plain text you asked me to encrypt.”
Since both the client and the server both know the text and they both know the secret and they both know the encryption method, they should get the same results. If they don’t, then the client isn’t authenticated.
Once the client has authenticated, the server sends a token back “Anytime you talk to me, use this token to prove you are who you say you are.”
Because that token is good for a long time, it is possible for a black hat to intercept or get that token, and take over the connection. This is bad.
All of this takes time and has numerous possible issues.
HMAC attempts to solve all the issues at once. First, A message sent with HMAC is self authenticated. There are no more communications required to know the sender is a particular person and that the particular person did, indeed, send this message.
This is done by creating some filler data and creating a known text string that includes the filler data and the body of the message. The server has access to the filler data, from this it can create the same string to be signed, which it does with the shared secret.
If the signatures match, then the client has properly authenticated.
Part of the message sent is the user access key. This is a random number expressed as characters which is unique and identifies the actual user. The server uses this access key to look up the shared secret.
At first go, my HMAC algorithm only worked for exactly one access key and one shared secret.
Did I mention that I don’t like JavaScript?
I needed some sort of persistent storage that would last from reboot to reboot that held user access keys and shared secrets.
The tool I chose was MongoDB. Which I’ve never used before.
Back to docker issues. Turns out that it is not easy to run MongoDB on the same image as I’m running my citation server. I ended up creating a second container that provides only the MongoDB service.
Which required learning how to build and configure a MongoDB server.
This is built and running. I have no idea how to access the database nor how to load it with data.
That’s a day of research. I’m able to load my credentials into the database and everything works, sort of.
Remember, I really don’t like JavaScript. Anytime you do any I/O, JavaScript wants you to release the thread, so it can do other work. Very nice. Unfortunately, that means you have to do something like
“request a connection to the database”
“wait for the connection”
“query the database for a particular user record”
“wait for the record to be found (or not)”
Continue processing with the data returned.
This is easy in most computer languages. In JavaScript, the software throws errors if you attempt to wait for an async function to return if you are not yourself an async function.l
JavaScript uses callbacks and Promises to accomplish this. I have no idea how to get all of this to work with callbacks. I have not been able to wrap my soggy old brain around how to get the main thread to just wait until all the callbacks complete.
I did finally have some insight into how to work with Promises. This means that I was able to connect to the MongoDB, retrieve the correct user record, and then perform the HMAC and have it all just work.
Someday I’m going to look at a project, and it will be just as straightforward as I hope on initial evaluation. Someday I will work on a project where the path forward is not strewn with rabbit holes and booby-traps. Yeah, and the God Faere will drop off a M2 and 20,000 rounds of 50 cal.
Have a great day.
Well, I have to say, I can’t wait for the sequel. I picture you bent over the keyboard, in a dark room with only one light on, door locked, and a carafe full of high-octane coffee from Columbia.