Thursday, April 27, 2017

Quest for camp stove fuel

For those of you who aren't keeping up with my occasional Twitter/Facebook posts on the subject, I volunteer with a local search and rescue unit. This means that a few times a month I have to grab my gear and run out into the woods on zero notice to find an injured hiker, locate an elderly person with Alzheimer's, or whatever the emergency du jour is.

Since I don't have time to grab fresh food on my way out the door when duty calls, I keep my pack and load-bearing vest stocked with shelf-stable foods like energy bars and surplus military rations. Many missions are short and intense, leaving me no time to eat anything but finger-food items (Clif bars and First Strike Ration sandwiches are my favorites) kept in a vest pocket.

My SAR vest. Weighs about 17 pounds / 7.7 kg once the Camelbak bladder is added.
On the other hand, during longer missions there may be opportunities to make hot food while waiting for a medevac helicopter, ground team with stretcher, etc - and of course there's plenty of time to cook a hot dinner during training weekends. Besides being a convenience, hot food and drink helps us (and the subject) avoid hypothermia so it can be a literal life-saver.

I've been using MRE chemical heaters for this, because they're small, lightweight (20 g / 0.7 oz each), and not too pricey (about $1 each from surplus dealers). Their major flaw is that they don't get all that hot, so during cold weather it's hard to get your food more than lukewarm.

I've used many kinds of camp stoves (propane and white gas primarily) over the course of my camping, but didn't own one small enough to use for SAR. My full 48-hour gear loadout (including water) weighs around 45 pounds / 20 kg, and I really didn't want to add much more to this. The MSR Whisperlite, for example, weighs in at 430 g / 15.2 oz for the stove, fuel pump, and wind shield. Add to this 150 g / 5.25 oz for the fuel bottle, a pot to cook in, and the fuel itself and you're looking at close to 1 kg / 2 pounds all told.

I have an aluminum camp frying pan that, including lid, weighs 121 g / 4.3 oz. It seemed hard to get much lighter for something large enough that you could squeeze an MRE entree into, so I kept it.

After a bit of browsing in the local Wal-Mart, I found a tiny sheet metal folding stove that weighed 112 g / 3.98 oz empty. It's designed to burn pellets of hexamine fuel.

The stove. Ignore the aluminum foil, it was there from a previous experiment.
In my testing it worked pretty well. One pellet brought 250 ml of water from 10C to boiling in six minutes, and held it at a boil for a minute before burning out. The fuel burned fairly cleanly and didn't leave that much soot on the pot either, which was nice.

What's not so nice, however, was the fuel. According to the MSDS, hexamine decomposes upon heating or contact with skin into formaldehyde, which is toxic and carcinogenic. Combustion products include such tasty substances as hydrogen cyanide and ammonia. This really didn't seem like something that I wanted to handle, or burn, in close proximity to food! Thus began my quest for a safer alternative.

My first thought was to use tea light candles, since I already had a case of a hundred for use as fire starters. In my testing, one tea light was able to heat a pot of water from 10C to 30C in a whopping 21 minutes before starting to reach an equilibrium where the pot lost heat as fast as it gained it. I continued the test out to 34 minutes, at which point it was a toasty 36C.

The stove was big enough to fit more than one tea light, so the obvious next step was to put six of them in a 3x2 grid. This heated significantly more, at the 36-minute mark my water measured a respectable 78C.

I figured I was on the right track, but needed to burn more wax per unit time. Some rough calculations suggested that a brick of paraffin wax the size of the stove and about as thick as a tea light contained 1.5 kWh of energy, and would output about 35 W of heat per wick. Assuming 25% energy transfer efficiency, which seemed reasonable based on the temperature data I had measured earlier, I needed to put out around 675 W to bring my pot to a boil in ten minutes. This came out to approximately 20 candle wicks.

I started out by folding a tray out of heavy duty aluminum foil, and reinforcing it on the outside with aluminum foil duct tape. I then bought a pack of tea light wicks on Amazon and attached them to the tray with double-sided tape.
Giant 20-wicked candle before adding wax
I made a water bath on my hot plate and melted a bunch of tea lights in a beaker. I wasn't in the mood to get spattered with hot wax so I wore long-sleeved clothes and a face shield. I was pretty sure that the water bath wouldn't get anywhere near the ignition point of the wax but did the work outside on a concrete patio and had a CO2 fire extinguisher on standby just in case.

Melting wax. Safety first, everyone!
The resulting behemoth of a candle actually looked pretty nice!
20-wick, 700W thermal output candle with tea lights for scale
After I was done and the wax had solidified I put the candle in my stove and lit it off. It took a while to get started (a light breeze kept blowing out one wick or another and I used quite a few matches to get them all lit), but after a while I had a solid flame going. At the six-minute mark my water had reached 37C.

A few minutes later, disaster struck! The pool of molten wax reached the flash point and ignited across the whole surface. At this point I had a massive flame - my pot went from 48 to 82C in two minutes! This translates to 2.6 kW assuming 100% energy transfer efficiency, so actual power output was probably upwards of 5 kW.

I removed the pot (using welding gloves since the flames were licking up the handle) and grabbed a photo of the fireball before thinking about how to extinguish the fire.

Pretty sure this isn't what a stove is supposed to look like
Since I was outside on a non-flammable surface the fire wasn't an immediate safety hazard, but I wanted to put it out non-destructively to preserve evidence for failure analysis. I opted to smother it with a giant candle snuffer that I rapidly folded out of heavy-duty aluminum foil.

The carnage after the fire was extinguished. Note the discolored wax!
It took me a while to clean up the mess - the giant candle had turned tan from incomplete combustion. It had also sprung a leak at some point, spilling a bit of wax out onto my patio.

On top of that, my pot was coal-black from all of the soot the super-rich flame was putting out. My wife wouldn't let it anywhere near the sink so I scrubbed it as best I could in the bathtub, then spent probably 20 minutes scrubbing all of the gray stains off the tub itself.

In order to avoid the time-consuming casting of wax, my next test used a slug of wax from a tea light that I drilled holes in, then inserted four wicks. I covered the top of the candle with aluminum foil tape to reflect heat back up at the pot, in a bid to increase efficiency and keep the melt puddle below the flash point.

Quad-wick tea light
This performed pretty well in my test. It got my pot up to 35C at the 12-minute mark, which was right about where I expected based on the x1 and x6 candle tests, and didn't flash over.

The obvious next step was to make five of them and see if this would work any better. It ignited more easily than the "brick" candle, and reached 83C at the 6-minute mark. Before T+ 7 minutes, however, the glue on the tape had failed from the heat, and the wax flashed. By the time I got the pot out of harm's way the water was boiling and it was covered in soot (again).

This time, it was a little bit breezier and my snuffer failed to exclude enough air to extinguish the flames. I ended up having to blast it with the CO2 extinguisher I had ready for just this situation. It wasn't hard to put out and I only used about two of the ten pounds of gas. (Ironically, I had planned to take the extinguisher in to get serviced the next morning because it was almost due for annual preventive maintenance. I ended up needing a recharge too...)

After cleaning off my pot and stove, and scraping some of the spilled wax off my driveway, it was back to the drawing board. I thought about other potential fuels I had lying around, and several obvious options came to mind.

Testing booze for flammability
I'm not a big drinker but houseguests have resulted in me having a few bottles of liquor around so I tested it out. Jack didn't burn at all, Captain Morgan white rum burned fitfully and left a sugary residue without putting out much heat. 100-proof vodka left a bit of starchy residue and was tricky to light.

A tea light cup full of 99% isopropyl alcohol brought my pot to 75C in five minutes before burning out, but was filthy and left soot everywhere. Hand sanitizer (about 60% ethanol) burned cleanly, but slower and cooler due to the water content - peak temperature of 54C and 12 minute burn time.

Ethanol seemed like a viable fuel if I could get it up to a higher concentration. I wanted to avoid liquid fuels due to difficulty of handling and the risk of spills, but a thick gel that didn't spill easily looked like a good option.

After a bit of research I discovered that calcium acetate (a salt of acetic acid) was very soluble in water, but not in alcohols. When a saturated solution of it in water is added to an alcohol it forms a stiff gel, commonly referred to as a "California snowball" because it burns and has a consistency like wet snow. I don't have any photos of my test handy, but here's a video from somebody else that shows it off nicely.

Two tea light cups full of the stuff brought my pot of water to a boil in 8 minutes, and held it there until burning out just before the 13-minute mark. I also tried boiling a FSR sandwich packet in a half-inch or so of water, and it was deliciously warm by the end. This seemed like a pretty good fuel!

Testing the calcium acetate fuel. I put a lid on the pot after taking this pic.

I filled two film-canister type containers with the calcium acetate + ethanol gel fuel and left it in my SAR pack. As luck would have it, I spent the next day looking for a missing hiker so it spent quite a while bouncing around driving on dirt roads and hiking.

When I got home I was disappointed to see clear liquid inside the bag that my stove and fuel were stored in. I opened the canisters only to find a thin whitish liquid instead of a stiff gel.

It seemed that the calcium acetate gel was not very stable, and over time the calcium acetate particles would precipitate out and the solution would revert to a liquid state. This clearly would not do.

Hand sanitizer seemed like a pretty good fuel other than being underpowered and perfumed, so I went to the grocery store and started looking at ingredient lists. They all seemed pretty similar - ethanol, water, aloe and other moisturizers, perfumes, maybe colorants, and a thickener. The thickener was typically either hydroxyethyl cellulose or a carbomer.

A few minutes on Amazon turned up a bag of Carbomer 940, a polyvinyl carboxy polymer cross-linked with esters of pentaerythritol. It's supposed to produce a viscosity of 45,000 to 70,000 CPS when added to water at 0.5% by weight. I also ordered a second bottle of Reagent Alcohol (90% ethanol / 5% methanol / 5% isopropanol with no bittering agents, ketones, or non-volatile ingredients) since my other one was pretty low after the calcium acetate failure.

Carbomer 940 is fairly acidic (pH 2.7 - 3.3 at 0.5% concentration) in its pure form and gel when neutral or alkaline, so it needs to be neutralized. The recommended base for alcohol-based gels was triethanolamine, so I picked up a bottle of that too.

Preparing to make carbomer-alcohol fuel gel

I made a 50% alcohol-water solution and added an 0.5% mass of carbomer. It didn't seem to fully dissolve, leaving a bunch of goopy chunks in the beaker.

Incompletely dissolved Carbomer 940 in 50/50 water/alcohol
I left it overnight to dissolve, blended it more, and then filtered off any big clumps with a coffee filter. I then added a few drops of triethanolamine, at which point the solution immediately turned cloudy. Upon blending, a rubbery white substance preciptated out of solution and stuck to my stick blender and the sidewalls of the beaker. This was not supposed to happen!

Rubbery goop on the blender head
Precipitate at the bottom of the beaker

I tried everything I could think of - diluting the triethanolamine and adding it slowly to reduce sudden pH changes, lowering the alcohol concentration, and even letting the carbomer sit in solution for a few days before adding the triethanolamine. Nothing worked.

I went back to square one and started reading more papers and watching process demonstration videos from the manufacturer. Eventually I noticed one source that suggested increasing the pH of the water to about 8 *before* adding the carbomer. This worked and gave a beautiful clear gel!

After a bit of tinkering I found a good process: Starting with 100 ml of water, titrate to pH 8 with triethanolamine. Add 1 g of carbomer powder and blend until fully gelled. Add 300 ml of reagent alcohol a bit at a time, mixing thoroughly after each addition. About halfway through adding the alcohol the gel started to get pretty runny so I mixed in a few more drops of triethanolamine and another 500 mg of carbomer powder before mixing in the rest of the alcohol. I had only a little more alcohol left in the bottle (maybe 50 ml) so I stirred that in without bothering to measure.

The resulting gel was quite stiff and held its shape for a little while after pouring, but could still be transferred between containers without muich difficulty.

Tea light can full of my final fuel
I left the beaker of fuel in my garage for several days and shook it around a bit, but saw no evidence of degradation. Since it's basically just turbo-strength hand sanitizer (~78% instead of the usual 30-60%) without all of the perfumes and moisturizers, it should be pretty stable. I had no trouble igniting it down to 10C ambient temperatures, but may find it necessary to mix in some acetone or other low-flash-point fuel to light it reliably in the winter.

The final batch of fuel filled two polypropylene specimen jars perfectly with just a little bit left over for a cooking test.

One of my two fuel jars
One tea light canister held 10.7 g / 0.38 oz of fuel, and I typically use two at a time, so 21.4 / 0.76 oz. One jar thus holds enough fuel for about five cook sessions, which is more than I'd ever need for a SAR mission or weekend camping trip. The final weight of my entire cooking system (stove, one fuel jar, tea light cans, and pot) comes out to 408 g / 14.41 oz, or a bit less than an empty Whisperlite stove (not counting the pot, fuel tank, or fuel)!

The only thing left was to try cooking on it. I squeezed a bacon-cheddar FSR sandwich into my pot, added a bit of water, and put it on top of the stove with two candle cups of fuel.

Nice clean blue flame, barely visible
By the six-minute mark the water was boiling away merrily and a cloud of steam was coming up around the edge of the lid. I took the pot off around 8 minutes and removed my snack.

Munching on my sandwich. You can't tell in this lighting, but the stove is still burning.
For those of you who haven't eaten First Strike Rations, the sandwiches in them are kind of like Hot Pockets or Toaster Strudels, except with a very thick and dense bread rather than a fluffy, flaky one. The fats in the bread are solid at room temperature and liquefy once it gets warm. This significantly softens the texture of the bread and makes it taste a lot better, so reaching this point is generally the primary goal when cooking one.

My sandwich was firmly over that line and tasted very good (for Army food baked two years ago). The bacon could have been a bit warmer, but the stove kept on burning until a bit after the ten-minute mark so I could easily have left it in the boiling water for another two minutes and made it even hotter.

Once I was done eating it was time to clean up. The stove had no visible dirt (beyond what was there from my previous experiments), and the tea light canisters were clean and fairly free of soot except in one or two spots around the edges. Almost no goopy residue was left behind.

Stove after the cook test
The pot was quite clean as well, with no black soot and only a very thin film of discoloration that was thin enough to leave colored interference fringes. Some of this was left over from previous testing, so if this test had been run on a virgin pot there'd be even less residue.

Bottom of the pot after the cook test

Overall, it was a long journey with many false steps, but I now have the ability to cook for myself over a weekend trip in less than a pound of weight, so I'm pretty happy. 

EDIT: A few people have asked to see the raw data from my temperature-vs-time cook tests, so here it is.

Raw data (graph 1)
Raw data (graph 2)

Saturday, April 8, 2017

STARSHIPRAIDER: Preparing for high-speed I/O characterization

In my previous post, I characterized the STARSHIPRAIDER I/O circuit for high voltage fault transient performance, but was unable to adequately characterize the high speed data performance because my DSO (Rigol DS1102D) only has 100 MHz of bandwidth.

Although I did have some ideas on how to improve the performance of the current I/O circuit, it was already faster than I could measure so I had no way to know if my improvements were actually making it any better. Ideally I'd just buy an oscilloscope with several GHz of bandwidth, but I'm not made of money and those scopes tend to be in the "request a quote" price range.

The obvious solution was to build one. I already had a proven high-speed sampling architecture from my TDR project so all I had to do was repackage it as an oscilloscope and make it faster still.

The circuit was beautifully simple: an output from the FPGA drives a 50 ohm trace to a SMA connector, then a second SMA connector drives the positive input of an ADCMP572 through a 3 dB attenuator (to keep my signal within range). The negative input is driven by a cheap 12-bit I2C DAC. The comparator output is then converted from CML to LVDS and fed to the host FPGA board. Finally, a 3.3V CML output from the FPGA drives the latch enable input on the comparator.

The "ADC" algorithm is essentially the same as on my TDR. I like to think of it as an equivalent-time version of a flash ADC: rather than 256 comparators digitizing the signal once, I digitize the signal 256 times with one comparator (and of course 256 different reference voltages). The post-processing to turn the comparator outputs into 8-bit ADC codes is the same.

Unlike the TDR, however, I also do equivalent-time sampling in the time domain. The FPGA generates the sampling and PRBS clocks with different PLL outputs (at 250 MHz / 4 ns period), and sweeps the relative phase in 100 ps steps to produce an effective resolution of 10 Gsps / 100 ps timebase.

Without further ado here's a picture of the board. Total BOM cost including connectors and PCB was approximately $50.

Oscilloscope board (yes, it's PMOD form factor!)
After some initial firmware development I was able to get some preliminary eye renders off the board. They were, to say the least, not ideal.

250 Mbps: very bumpy rise
500 Mbps: significant eye closure even with increased drive strength

I spent quite a while tracking down other bugs before dealing with the signal integrity issues. For example, a low-frequency pulse train showed up with a very uneven duty cycle:

Duty cycle distortion
Someone suggested that I try a slow rise time pulse to show the distortion more clearly. Not having a proper arbitrary waveform generator, I made do with a squarewave and R-C lowpass filter.

Ever seen breadboarded passives interfacing to edge-launch SMA connectors before?
It appeared that I had jump discontinuities in my waveform every two blocks (color coding)
I don't have an EE degree, but I can tell this looks wrong!

Interestingly enough, two blocks (of 32 samples each) were concatenated into a single JTAG transfer. These two were read in one clock cycle and looked fine, but the junction to the next transfer seemed to be skipping samples.

As it turned out, I had forgotten to clear a flag which led to me reading the waveform data before it was done capturing. Since the circular buffer was rotating in between packets, some samples never got sent.

The next bug required zooming into the waveform a bit to see. The samples captured on the first few (the number seemed to vary across bitstream builds) of my 40 clock phases were showing up shifted by 4 ns (one capture clock).

Horizontally offset samples

I traced this issue to a synchronizer between clock domains having variable latency depending on the phase offset of the source and destination clocks. This is an inherent issue in clock domain crossing, so I think I'm just going to have to calibrate it out somehow. For the short term I'm manually measuring the number of offset phases each time I recompile the FPGA image, and then correcting the data in post-processing.

The final issue was a hardware bug. I was terminating the incoming signal with a 50Ω resistor to ground. Although this had good AC performance, at DC the current drawn from a high-level input was quite significant (66 mA at 3.3V). Since my I/O pins can't drive this much, the line was dragged down.

I decided to rework the input termination to replace the 50Ω terminator with split 100Ω resistors to 3.3V and ground. This should have about half the DC current draw, and is Thevenin equivalent to a 50Ω terminator to 1.65V. As a bonus, the mid-level termination will also allow me to AC-couple the incoming signal if that becomes necessary.

Mill out trace from ground via to on-die 50Ω termination resistor

Remove soldermask from ground via and signal trace

Add 100Ω 0402 low-side terminator
Add 100Ω 0402 high-side terminator, plus jumper trace to 3.3V bulk decoupling cap

Add 10 nF high speed decoupling cap to help compensate for inductance of long feeder trace
I cleaned off all of the flux residue and ran a second set of eye loopback tests at 250 and 500 Mbps. The results were dramatically improved:

Post-rework at 250 Mbps
Post-rework at 500 Mbps
While not perfect, the new eye openings are a lot cleaner. I hope to tweak my input stage further to reduce probing artifacts, but for the time being I think I have sufficient performance to compare multiple STARSHIPRAIDER test circuits and see how they stack up at relatively high speeds.

Next step: collect some baseline data for the current STARSHIPRAIDER characterization board, then use that to inform my v0.2 I/O circuit!

Sunday, February 5, 2017

STARSHIPRAIDER: Input buffer rev 0.1 design and characterization

Working as an embedded systems pentester is a lot of fun, but it comes with some annoying problems. There's so many tools that I can never seem to find the right one. Need to talk to a 3.3V UART? I almost invariably have an FTDI cable configured for 5 or 1.8V on my desk instead. Need to dump a 1.8V flash chip? Most of our flash dumpers won't run below 3.3. Need to sniff a high-speed bus? Most of the Saleae Logic analyzers floating around the lab are too slow to keep up with fast signals, and the nice oscilloscopes don't have a lot of channels. And everyone's favorite jack-of-all-trades tool, the Bus Pirate, is infamous for being slow.

As someone with no shortage of virtual razors, I decided that this yak needed to be shaved! The result was an ongoing project I call STARSHIPRAIDER. There will be more posts on the project in the coming months so stay tuned!

The first step was to decide on a series of requirements for the project:
  • 32 bidirectional I/O ports split into four 8-pin banks.
    This is enough to sniff any commonly encountered embedded bus other than DRAM. Multiple banks are needed to support multiple voltage levels in the same target.
  • Full support for 1.2 to 5V logic levels.This is supposed to be a "Swiss Army knife" embedded systems debug/testing tool. This voltage range encompasses pretty much any signalling voltage commonly encountered in embedded devices.
  • Tolerance to +/- 12V DC levels.Test equipment needs to handle some level of abuse. When you're reverse engineering a board it's easy to hook up ground to the wrong signal, probe a power rail, or even do both at once. The device doesn't have to function in this state (shutting down for protection is OK) but needs to not suffer permanent damage. It's also OK if the protection doesn't handle AC sources - the odds of accidentally connecting a piece of digital test equipment to a big RF power amplifier are low enough that I'm not worried.
  • 500 Mbps input/output rate for each pin.This was a somewhat arbitrary choice, but preliminary math indicated it was feasible. I wanted something significantly faster than existing tools in the class.
  • Ethernet-based interface to host PC.I've become a huge fan of Ethernet and IPv6 as communications interface for my projects. It doesn't require any royalties or license fees, scales from 10 Mbps to >10 Gbps and supports bridging between different link speeds, supports multi-master topologies, and can be bridged over a WAN or VPN. USB and PCIe, the two main alternatives, can do few if any of these.
  • Large data buffer.Most USB logic analyzers have very high peak capture rates, but the back-haul interface to the host PC can't keep up with extended captures at high speed. Commodity DRAM is so cheap that there's no reason to not stick a whole SODIMM of DDR3 in the instrument to provide an extremely deep capture buffer.
  • Multiple virtual instruments connected to a crossbar.Any nontrivial embedded device contains multiple buses of interest to a reverse engineer. STARSHIPRAIDER needs to be able to connect to several at once (on arbitrary pins), bridge them out to separate TCP ports, and allow multiple testers to send test vectors to them independently.
The brain of the system will be fairly straightforward high-speed digital. It will be a 6-8 layer PCB with an Artix-7 FPGA in FGG484 package, a SODIMM socket for 4GB of DDR3 800, a KSZ9031 Gigabit Ethernet PHY, a TLK10232 10gbit Ethernet PHY, and a SFP+ cage, plus some sort of connector (most likely a Samtec Q-strip) for talking to the I/O subsystem on a separate board.

The challenging part of the design, from an architectural perspective, seemed to be the I/O buffer and input protection circuit, so I decided to prototype it first.

STARSHIPRAIDER v0.1 I/O buffer design

A block diagram of the initial buffer design is shown above. The output buffer will be discussed in a separate post once I've had a chance to test it; today we'll be focusing on the input stage (the top half of the diagram).

During normal operation, the protection relay is closed. The series resistor has insignificant resistance compared to the input impedance of the comparator (an ADCMP607), so it can be largely ignored. The comparator checks the input signal against a threshold (chosen appropriately for the I/O standard in use) and sends a differential signal to the host board for processing. But what if something goes wrong?

If the user accidentally connects the probe to a signal outside the acceptable voltage range, a Schottky diode connected to the +5V or ground rail will conduct and shunt the excess voltage safely into the power rails. The series resistor limits fault current to a safe level (below the diode's peak power rating). After a short time (about 150 µs with my current relay driver), the protection relay opens and breaks the circuit.

The relay is controlled by a Silego GreenPAK4 mixed-signal FPGA, running a small design written in Verilog and compiled with my open-source toolchain. The code for the GreenPAK design is on Github.

All well and good in theory... but does it work? I built a characterization board containing a single I/O buffer and loaded with test points and probe connectors. You can grab the KiCAD files for this on Github as well. Here's a quick pic after assembly:

STARSHIPRAIDER I/O characterization board
Initial test results were not encouraging. Positive overvoltage spikes were clamped to +8V and negative spikes were clamped to -1V - well outside the -0.5 to +6V absolute max range of my comparator.
Positive transient response

Negative transient response

After a bit of review of the schematics, I found two errors. The "5V" ESD diode I was using to protect the high side had a poorly controlled Zener voltage and could clamp as high as 8V or 9V. The Schottky on the low side was able to survive my fault current but the forward voltage increased massively beyond the nominal value.

I reworked the board to replace the series resistor with a larger one (39 ohms) to reduce the maximum fault current, replaced the low-side Schottky with one that could handle more current, and replaced the Zener with an identical Schottky clamping to the +5V rail.

Testing this version gave much better results. There was still a small amount of ringing (less than five nanoseconds) a few hundred mV past the limit, but the comparator's ESD diodes should be able to safely dissipate this brief pulse.

Positive transient response, after rework
Negative transient response, after rework
Now it was time to test the actual signal path. My first iteration of the test involved cobbling together a signal path from an FPGA board through the test platform and to the oscilloscope without any termination. The source of the signal was a BNC-to-minigrabber flying lead test clip! Needless to say, results were less than stellar.

PRBS31 eye at 80 Mbps through protection circuit with flying leads and no terminator
After ordering some proper RF test supplies (like an inline 50 ohm BNC terminator), I got much better signal quality. The eye was very sharp and clear at 100 Mbps. It was visibly rounded at 200 Mbps, but rendering a squarewave at that rate requires bandwith much higher than the 100 MHz of my oscilloscope so results were inconclusive.

PRBS31 eye at 100 Mbps through protection circuit with proper cabling
PRBS31 eye at 200 Mbps, limited by oscilloscope bandwidth
I then hooked the protection circuit up to the comparator to test the entire inbound signal chain. While the eye looked pretty good at 100 Mbps (plotting one leg of the differential since my scope was out of channels), at 200 Mbps horrible jitter appeared.

PRBS31 eye at 100 Mbps through full input buffer
PRBS31 eye at 200 Mbps through full input buffer
After quite a bit of scratching my head and fumbling with datasheets, I realized my oscilloscope was the problem by plotting the clock reference I was triggering on. The jitter was visible in this clock as well, suggesting that it was inherent in the oscilloscope's trigger circuit. This isn't too surprising considering I'm really pushing the limits of this scope - I need a better one to do this kind of testing properly.

PRBS31 eye at 200 Mbps plus 200 MHz sync clock
 At this point I've done about all of the input stage testing I can do with this oscilloscope. I'm going to try and rig up a BER tester on the FPGA so I can do PRBS loopback through the protection stage and comparator at higher speeds, then repeat for the output buffer and the protection run in the opposite direction.

I still have more work to do on the protection circuit as well... while it's fine at 100 Mbps, the 2x 10pF Schottky diode parasitic capacitance is seriously degrading my rise times (I calculated an RC filter -3dB point of around 200 MHz, so higher harmonics are being chopped off). I have some ideas on how I can cut this down much less but that will require a board respin and another blog post!

Sunday, May 8, 2016

Open Verilog flow for Silego GreenPak4 programmable logic devices

I've written a couple of posts in the past few months but they were all for the blog at work so I figured I'm long overdue for one on Silicon Exposed.

So what's a GreenPak?

Silego Technology is a fabless semiconductor company located in the SF Bay area, which makes (among other things) a line of programmable logic devices known as GreenPak. Their 5th generation parts were just announced, but I started this project before that happened so I'm still targeting the 4th generation.

GreenPak devices are kind of like itty bitty PSoCs - they have a mixed signal fabric with an ADC, DACs, comparators, voltage references, plus a digital LUT/FF fabric and some typical digital MCU peripherals like counters and oscillators (but no CPU).

It's actually an interesting architecture - FPGAs (including some devices marketed as CPLDs) are a 2D array of LUTs connected via wires to adjacent cells, and true (product term) CPLDs are a star topology of AND-OR arrays connected by a crossbar. GreenPak, on the other hand, is a star topology of LUTs, flipflops, and analog/digital hard IP connected to a crossbar.

Without further ado, here's a block diagram showing all the cool stuff you get in the SLG46620V:

SLG46620V block diagram (from device datasheet)
They're also tiny (the SLG46620V is a 20-pin 0.4mm pitch STQFN measuring 2x3 mm, and the lower gate count SLG46140V is a mere 1.6x2 mm) and probably the cheapest programmable logic device on the market - $0.50 in low volume and less than $0.40 in larger quantities.

The Vdd range of GreenPak4 is huge, more like what you'd expect from an MCU than an FPGA! It can run on anything from 1.8 to 5V, although performance is only specified at 1.8, 3.3, and 5V nominal voltages. There's also a dual-rail version that trades one of the GPIO pins for a second power supply pin, allowing you to interface to logic at two different voltage levels.

To support low-cost/space-constrained applications, they even have the configuration memory on die. It's one-time programmable and needs external Vpp to program (presumably Silego didn't want to waste die area on charge pumps that would only be used once) but has a SRAM programming mode for prototyping.

The best part is that the development software (GreenPak Designer) is free of charge and provided for all major operating systems including Linux! Unfortunately, the only supported design entry method is schematic entry and there's no way to write your design in a HDL.

While schematics may be fine for quick tinkering on really simple designs, they quickly get unwieldy. The nightmare of a circuit shown below is just a bunch of counters hooked up to LEDs that blink at various rates.

Schematic from hell!
As if this wasn't enough of a problem, the largest GreenPak4 device (the SLG46620V) is split into two halves with limited routing between them, and the GUI doesn't help the user manage this complexity at all - you have to draw your schematic in two halves and add "cross connections" between them.

The icing on the cake is that schematics are a pain to diff and collaborate on. Although GreenPak schematics are XML based, which is a touch better than binary, who wants to read a giant XML diff and try to figure out what's going on in the circuit?

This isn't going to be a post on the quirks of Silego's software, though - that would be boring. As it turns out, there's one more exciting feature of these chips that I didn't mention earlier: the configuration bitstream is 100% documented in the device datasheet! This is unheard of in the programmable logic world. As Nick of Arachnid Labs says, the chip is "just dying for someone to write a VHDL or Verilog compiler for it". As you can probably guess by from the title of this post, I've been busy doing exactly that.

Great! How does it work?

Rather than wasting time writing a synthesizer, I decided to write a GreenPak technology library for Clifford Wolf's excellent open source synthesis tool, Yosys, and then make a place-and-route tool to turn that into a final netlist. The post-PAR netlist can then be loaded into GreenPak Designer in order to program the device.

The first step of the process is to run the "synth_greenpak4" Yosys flow on the Verilog source. This runs a generic RTL synthesis pass, then some coarse-grained extraction passes to infer shift register and counter cells from behavioral logic, and finally maps the remaining logic to LUT/FF cells and outputs a JSON-formatted netlist.

Once the design has been synthesized, my tool (named, surprisingly, gp4par) is then launched on the netlist. It begins by parsing the JSON and constructing a directed graph of cell objects in memory. A second graph, containing all of the primitives in the device and the legal connections between them, is then created based on the device specified on the command line. (As of now only the SLG46620V is supported; the SLG46621V can be added fairly easily but the SLG46140V has a slightly different microarchitecture which will require a bit more work to support.)

After the graphs are generated, each node in the netlist graph is assigned a numeric label identifying the type of cell and each node in the device graph is assigned a list of legal labels: for example, an I/O buffer site is legal for an input buffer, output buffer, or bidirectional buffer.

Example labeling for a subset of the netlist and device graphs
The labeled nodes now need to be placed. The initial placement uses a simple greedy algorithm to create a valid (although not necessarily optimal or even routable) placement:
  1. Loop over the cells in the netlist. If any cell has a LOC constraint, which locks the cell to a specific physical site, attempt to assign the node to the specified site. If the specified node is the wrong type, doesn't exist, or is already used by another constrained node, the constraint is invalid so fail with an error.
  2. Loop over all of the unconstrained cells in the netlist and assign them to the first unused site with the right label. If none are available, the design is too big for the device so fail with an error.
Once the design is placed, the placement optimizer then loops over the design and attempts to improve it. A simulated annealing algorithm is used, where changes to the design are accepted unconditionally if they make the placement better, and with a random, gradually decreasing probability if they make it worse. The optimizer terminates when the design receives a perfect score (indicating an optimal placement) or if it stops making progress for several iterations. Each iteration does the following:
  1. Compute a score for the current design based on the number of unroutable nets, the amount of routing congestion (number of nets crossing between halves of the device), and static timing analysis (not yet implemented, always zero).
  2. Make a list of nodes that contributed to this score in some way (having some attached nets unroutable, crossing to the other half of the device, or failing timing).
  3. Remove nodes from the list that are LOC'd to a specific location since we're not allowed to move them.
  4. Remove nodes from the list that have only one legal placement in the device (for example, oscillator hard IP) since there's nowhere else for them to go.
  5. Pick a node from the remainder of the list at random. Call this our pivot.
  6. Find a list of candidate placements for the pivot:
    1. Consider all routable placements in the other half of the device.
    2. If none were found, consider all routable placements anywhere in the device.
    3. If none were found, consider all placements anywhere in the device even if they're not routable.
  7. Pick one of the candidates at random and move the pivot to that location. If another cell in the netlist is already there, put it in the vacant site left by the pivot.
  8. Re-compute the score for the design. If it's better, accept this change and start the next iteration.
  9. If the score is worse, accept it with a random probability which decreases as the iteration number goes up. If the change is not accepted, restore the previous placement.
After optimization, the design is checked for routability. If any edges in the netlist graph don't correspond to edges in the device graph, the user probably asked for something impossible (for example, trying to hook a flipflop's output to a comparator's reference voltage input) so fail with an error.

The design is then routed. This is quite simple due to the crossbar structure of the device. For each edge in the netlist:
  1. If dedicated (non-fabric) routing is used for this path, configure the destination's input mux appropriately and stop.
  2. If the source and destination are in the same half of the device, configure the destination's input mux appropriately and stop.
  3. A cross-connection must be used. Check if we already used one to bring the source signal to the other half of the device. If found, configure the destination to route from that cross-connection and stop.
  4. Check if we have any cross-connections left going in this direction. If they're all used, the design is unroutable due to congestion so fail with an error.
  5. Pick the next unused cross-connection and configure it to route from the source. Configure the destination to route from the cross-connection and stop.
Once routing is finished, run a series of post-PAR design rule checks. These currently include the following:
  • If any node has no loads, generate a warning
  • If an I/O buffer is connected to analog hard IP, fail with an error if it's not configured in analog mode.
  • Some signals (such as comparator inputs and oscillator power-down controls) are generated by a shared mux and fed to many loads. If different loads require conflicting settings for the shared mux, fail with an error.
If DRC passes with no errors, configure all of the individual cells in the netlist based on the HDL parameters. Fail with an error if an invalid configuration was requested.

Finally, generate the bitstream from all of the per-cell configuration and write it to a file.

Great, let's get started!

If you don't already have one, you'll need to buy a GreenPak4 development kit. The kit includes samples of the SLG46620V (among other devices) and a programmer/emulation board. While you're waiting for it to arrive, install GreenPak Designer.

Download and install Yosys. Although Clifford is pretty good at merging my pull requests, only my fork on Github is guaranteed to have the most up-to-date support for GreenPak devices so don't be surprised if you can't use a bleeding-edge feature with mainline Yosys.

Download and install gp4par. You can get it from the Github repository.

Write your HDL, compile with Yosys, P&R with gp4par, and import the bitstream into GreenPak Designer to program the target device. The most current gp4par manual is included in LaTeX source form in the source tree and is automatically built as part of the compile process. If you're just browsing, there's a relatively recent PDF version on my web server.

If you'd like to see the Verilog that produced the nightmare of a schematic I showed above, here it is.

Be advised that this project is still very much a work in progress and there are still a number of SLG46620V features I don't support (see the manual for exact details).

I love it / it segfaulted / there's a problem in the manual!

Hop in our IRC channel (##openfpga on Freenode) and let me know. Feedback is great, pull requests are even better,

You're competing with Silego's IDE. Have they found out and sued you yet?

Nope. They're fully aware of what I'm doing and are rolling out the red carpet for me. They love the idea of a HDL flow as an alternative to schematic entry and are pretty amazed at how fast it's coming together.

After I reported a few bugs in their datasheets they decided to skip the middleman and give me direct access to the engineer who writes their documentation so that I can get faster responses. The last time I found a problem (two different parts of the datasheet contradicted each other) an updated datasheet was in my inbox and on their website by the next day. I only wish Xilinx gave me that kind of treatment!

They've even offered me free hardware to help me add support for their latest product family, although I plan to get GreenPak4 support to a more stable state before taking them up on the offer.

So what's next?

Better testing, for starters. I have to verify functionality by hand with a DMM and oscilloscope, which is time consuming.

My contact at Silego says they're going to be giving me documentation on the SRAM emulation interface soon, so I'm going to make a hardware-in-loop test platform that connects to my desktop and the Silego ZIF socket, and lets me load new bitstreams via a scriptable interface. It'll have FPGA-based digital I/O as well as an ADC and DAC on every device pin, plus an adjustable voltage regulator for power, so I can feed in arbitrary mixed-signal test waveforms and write PC-based unit tests to verify correct behavior.

Other than that, I want to finish support for the SLG46620V in the next month or two. The SLG46621V will be an easy addition since only one pin and the relevant configuration bits have changed from the 46620 (I suspect they're the same die, just bonded out differently).

Once that's done I'll have to do some more extensive work to add the SLG46140V since the architecture is a bit different (a lot of the combinatorial logic is merged into multi-function blocks). Luckily, the 46140 has a lot in common architecturally with the GreenPak5 family, so once that's done GreenPak5 will probably be a lot easier to add support for.

My thanks go out to Clifford Wolf, whitequark, the IRC users in ##openfpga, and everyone at Silego I've worked with to help make this possible. I hope that one day this project will become mature enough that Silego will ship it as an officially supported extension to GreenPak Designer, making history by becoming the first modern programmable logic vendor to ship a fully open source synthesis and P&R suite.