Monday, December 3, 2012

CMake, CTest, and CDash for Xilinx FPGAs, part 2

This is a follow-up to my post from yesterday. I've made major progress and, if I knew things would go this fast, wouldn't have written that post until today :)

The current version of the script is able to compile HDL designs to both FPGA bitstreams and ISim test cases, as well as running the simulation executable in the form of a unit test. There's no direct support for CPLDs yet (which will pretty much involve refactoring the code to call xst out into a function and adding some code to call cpldfit) but that will come soon.

Also on the to-do list:
  • Support for invoking PlanAhead in both pre-synthesis and post-PAR modes
  • Support for programming bitstreams to FPGAs and CPLDs using iMPACT via a "make program" type target
  • Support for indirect programming (need to generate ROM files etc)
  • Support for programming bitstreams to FPGAs and CPLDs using my JTAG toolchain (uses libftdi and the Digilent API as back ends, so I can integrate FT2232-based debug/program modules into my boards and not rely on the Xilinx platform cable)
  • Support for more command-line flags for the toolchain. Right now all of the ngdbuild/map/par/trce/bitgen flags are hard-coded and only about half of the default xst flags are changeable.
  • Support for mixed hardware/ISim/C++ cosimulation (using pipes and $fread/$fwrite to bridge to ISim and JTAG to bridge to real hardware)
Without further ado, here's a usage example for the major new feature:

########################################################################################################################
# Global synthesis flags

set(XILINX_FILTER_FILE ${CMAKE_CURRENT_SOURCE_DIR}/filter.filter)

set(XST_KEEP_HIERARCHY Soft)
set(XST_NETLIST_HIERARCHY Rebuilt)

########################################################################################################################
# Current top-level module
add_fpga_target(
 OUTPUT
  JtagTest
 TOP_LEVEL
  ${CMAKE_CURRENT_SOURCE_DIR}/JtagTest.v
 CONSTRAINTS
  ${CMAKE_SOURCE_DIR}/ucf/JtagTest.ucf
 DEVICE
  xc6slx45-3-csg324
 SOURCES
  ${CMAKE_CURRENT_SOURCE_DIR}/debug/JtagDebugController.v
  ${CMAKE_CURRENT_SOURCE_DIR}/noc/common/NOCArbiter.v
  ${CMAKE_CURRENT_SOURCE_DIR}/noc/common/NOCRouterCore.v
  ${CMAKE_CURRENT_SOURCE_DIR}/noc/common/NOCRouterMux.v
  ${CMAKE_CURRENT_SOURCE_DIR}/noc/rpc/RPCRouter.v
  ${CMAKE_CURRENT_SOURCE_DIR}/noc/rpc/RPCRouterExitQueue.v
  ${CMAKE_CURRENT_SOURCE_DIR}/peripherals/NetworkedButtonArray.v
  ${CMAKE_CURRENT_SOURCE_DIR}/peripherals/NetworkedLEDBank.v
  ${CMAKE_CURRENT_SOURCE_DIR}/util/MediumBlockRamFifo.v
  ${CMAKE_CURRENT_SOURCE_DIR}/util/SwitchDebouncer.v
  ${CMAKE_CURRENT_SOURCE_DIR}/util/SwitchDebouncerBlock.v
  ${CMAKE_CURRENT_SOURCE_DIR}/util/ThreeStageSynchronizer.v
 )

The add_fpga_target function uses the OUTPUT parameter as the base name for all of the temporary files created during compilation.

The TOP_LEVEL parameter specifies the top-level source file for the module. For now the base name of the TOP_LEVEL file is used as the top-level module name; in the future I may make the TOP_LEVEL parameter specify the module name and then add that file (along with all the others) to the SOURCES section.

DEVICE and SOURCES should be self-explanatory. Note that the Xilinx toolchain expects the part numbers in a specific format - there's a dash between the speed grade and the package (unlike the actual part numbers) and the temperature range is not specified.

Full source for this monster is below. Now that it's reached the point of basic usability I won't be blogging on it anymore except to announce the stable release on Google Code once I've worked out the rest of the kinks and bugs.

########################################################################################################################
# @file FindXilinx.cmake
# @author Andrew D. Zonenberg
# @brief Xilinx ISE toolchain CMake module
########################################################################################################################

########################################################################################################################
# Autodetect Xilinx paths (very hacky for now)

# TODO: Print messages only when configuring

# Find /opt/Xilinx or similar
find_file(XILINX_PARENT NAMES Xilinx PATHS /opt)
if(XILINX_PARENT STREQUAL "XILINX_PARENT-NOTFOUND")
 message(FATAL_ERROR "No Xilinx toolchain installation found")
endif()

# Find /opt/Xilinx/VERSION
# TODO: Figure out a better way of doing this
find_file(XILINX NAMES 14.3 PATHS ${XILINX_PARENT})
if(XILINX STREQUAL "XILINX-NOTFOUND")
 message(FATAL_ERROR "No ISE 14.3 installation found")
endif()
#message(STATUS "Found Xilinx toolchain... ${XILINX}")

# Set current OS architecture (TODO: autodetect)
set(XILINX_ARCH lin64)

# Find fuse
find_program(FUSE names fuse PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/" NO_DEFAULT_PATH)
if(FUSE STREQUAL "FUSE-NOTFOUND")
 message(FATAL_ERROR "No Xilinx fuse installation found")
endif()
#message(STATUS "Found Xilinx fuse... ${FUSE}")

# Find xst
find_file(XST NAMES xst PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/")
if(XST STREQUAL "XST-NOTFOUND")
 message(FATAL_ERROR "No Xilinx xst installation found")
endif()
#message(STATUS "Found Xilinx xst... ${XST}")

# Find ngdbuild
find_file(NGDBUILD NAMES ngdbuild PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/")
if(NGDBUILD STREQUAL "NGDBUILD-NOTFOUND")
 message(FATAL_ERROR "No Xilinx ngdbuild installation found")
endif()
#message(STATUS "Found Xilinx ngdbuild... ${NGDBUILD}")

# Find map
find_file(MAP NAMES map PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/")
if(MAP STREQUAL "MAP-NOTFOUND")
 message(FATAL_ERROR "No Xilinx map installation found")
endif()
#message(STATUS "Found Xilinx map... ${MAP}")

# Find par
find_file(PAR NAMES par PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/")
if(PAR STREQUAL "PAR-NOTFOUND")
 message(FATAL_ERROR "No Xilinx par installation found")
endif()
#message(STATUS "Found Xilinx par... ${PAR}")

# Find trce
find_file(TRCE NAMES trce PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/")
if(TRCE STREQUAL "TRCE-NOTFOUND")
 message(FATAL_ERROR "No Xilinx trce installation found")
endif()
#message(STATUS "Found Xilinx trce... ${TRCE}")

# Find bitgen
find_file(BITGEN NAMES bitgen PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/")
if(BITGEN STREQUAL "BITGEN-NOTFOUND")
 message(FATAL_ERROR "No Xilinx bitgen installation found")
endif()
#message(STATUS "Found Xilinx bitgen... ${BITGEN}")

########################################################################################################################
# Argument parsing helper

macro(xilinx_parse_args _output _top_level _ucf _device _sources)
 set(${_top_level} FALSE)
 set(${_output} FALSE)
 set(${_ucf} FALSE)
 set(${_device} FALSE)
 set(${_sources})
 set(_found_sources FALSE)
 set(_found_device FALSE)
 set(_found_output FALSE)
 set(_found_ucf FALSE)
 set(_found_top_level FALSE)
 foreach(arg ${ARGN})
  if(${arg} STREQUAL "TOP_LEVEL")
   set(_found_top_level TRUE)
  elseif(${arg} STREQUAL "SOURCES")
   set(_found_sources TRUE)
  elseif(${arg} STREQUAL "CONSTRAINTS")
   set(_found_ucf TRUE)
  elseif(${arg} STREQUAL "DEVICE")
   set(_found_device TRUE)
  elseif(${arg} STREQUAL "OUTPUT")
   set(_found_output TRUE)
  elseif(${_found_sources})
   list(APPEND ${_sources} ${arg})
  elseif(${_found_device})
   if(${_device})
    message(FATAL_ERROR "Multiple devices specified in xilinx_parse_args")
   else()
    set(${_device} ${arg})    
   endif()
  elseif(${_found_ucf})
   if(${_ucf})
    message(FATAL_ERROR "Multiple constraint files specified in xilinx_parse_args")
   else()
    set(${_ucf} ${arg})    
   endif()
  elseif(${_found_top_level})
   if(${_top_level})
    message(FATAL_ERROR "Multiple top-level files specified in xilinx_parse_args (${_top_level})")
   else()
    set(${_top_level} ${arg})    
   endif()
  elseif(${_found_output})
   if(${_output})
    message(FATAL_ERROR "Multiple outputs specified in xilinx_parse_args")
   else()
    set(${_output} ${arg})    
   endif()
  else()
   message(FATAL_ERROR "Unrecognized command ${arg} in xilinx_parse_args")
  endif()
 endforeach()
endmacro()

########################################################################################################################
# Default flags for fuse
set(FUSE_FLAGS "-intstyle ise -incremental -lib unisims_ver -lib unimacro_ver -lib xilinxcorelib_ver -lib secureip")

########################################################################################################################
# ISim executable generation

function(add_isim_executable OUTPUT_FILE )
  
 # Parse args
 xilinx_parse_args(OUTFNAME TOP_LEVEL UCF DEVICE SOURCES ${ARGN})
 
 # Get base name without extension of the top-level module
 get_filename_component(TOPLEVEL_BASENAME ${TOP_LEVEL} NAME_WE )
 
 # Write the .prj file
 set(PRJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}.prj")
 file(WRITE ${PRJ_FILE} "verilog work \"${TOP_LEVEL}\"\n")
 foreach(f ${SOURCES})
  file(APPEND ${PRJ_FILE} "verilog work \"${f}\"\n")
 endforeach()
 file(APPEND ${PRJ_FILE} "verilog work \"${XILINX}/ISE_DS/ISE/verilog/src/glbl.v\"\n")
 
 # Write the run-fuse wrapper script
 set(FUSE_ERR_LOG "${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}_err.log")
 set(FUSE_LOG "${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}_build.log")
 set(FUSE_WRAPPER "${CMAKE_CURRENT_BINARY_DIR}/runfuse${OUTPUT_FILE}.sh")
 file(WRITE ${FUSE_WRAPPER} "#!/bin/bash\n")
 file(APPEND ${FUSE_WRAPPER} "cd ${CMAKE_CURRENT_BINARY_DIR}\n")
 #file(APPEND ${FUSE_WRAPPER} "source ${XILINX}/ISE_DS/settings64.sh\n")
 file(APPEND ${FUSE_WRAPPER} "${FUSE} ${FUSE_FLAGS} -o ${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE} -prj ${PRJ_FILE}")
 file(APPEND ${FUSE_WRAPPER} "   work.${TOPLEVEL_BASENAME} work.glbl > ${FUSE_LOG} 2> ${FUSE_ERR_LOG}\n")
 file(APPEND ${FUSE_WRAPPER} "if [ \"$?\" != \"0\" ]; then\n")
 file(APPEND ${FUSE_WRAPPER} "    cat ${FUSE_ERR_LOG} | grep \"ERROR\"\n")
 file(APPEND ${FUSE_WRAPPER} "    exit 1;\n")
 file(APPEND ${FUSE_WRAPPER} "fi\n")
 file(APPEND ${FUSE_WRAPPER} "exit 0;\n")
 execute_process(COMMAND chmod +x ${FUSE_WRAPPER})
 
 # Main compile rule
 # TODO: tweak this
 add_custom_target(
  ${OUTPUT_FILE} ALL
  COMMAND ${FUSE_WRAPPER}
  DEPENDS ${SOURCES} ${TOP_LEVEL}
  COMMENT "Building ISim executable ${OUTPUT_FILE}..."
 )
 
 # Write the tcl script
 set(TCL_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}.tcl")
 file(WRITE ${TCL_FILE} "onerror {resume}\n")
 file(APPEND ${TCL_FILE} "wave add /\n")
 file(APPEND ${TCL_FILE} "run 1000 ns;\n")
 file(APPEND ${TCL_FILE} "exit;\n")
 
 # Write the run-test wrapper script
 set(TEST_WRAPPER "${CMAKE_CURRENT_BINARY_DIR}/run${OUTPUT_FILE}.sh")
 file(WRITE ${TEST_WRAPPER} "#!/bin/bash\n")
 file(APPEND ${TEST_WRAPPER} "cd ${CMAKE_CURRENT_BINARY_DIR}\n")
 file(APPEND ${TEST_WRAPPER} "source ${XILINX}/ISE_DS/settings64.sh\n")
 file(APPEND ${TEST_WRAPPER} "./${OUTPUT_FILE} -tclbatch ${TCL_FILE} -intstyle silent -vcdfile ${OUTPUT_FILE}.vcd -vcdunit ps || exit 1\n")
 file(APPEND ${TEST_WRAPPER} "cat isim.log | grep -q FAIL\n")
 file(APPEND ${TEST_WRAPPER} "if [ \"$?\" != \"1\" ]; then\n")
 file(APPEND ${TEST_WRAPPER} "    exit 1;\n")
 file(APPEND ${TEST_WRAPPER} "fi\n")
 execute_process(COMMAND chmod +x ${TEST_WRAPPER})
 
endfunction()

########################################################################################################################
# Test generation
#
# Usage:
# add_isim_test(NandGate
# TOP_LEVEL
#  ${CMAKE_CURRENT_SOURCE_DIR}/testNandGate.v
# SOURCES 
#  ${CMAKE_SOURCE_DIR}/hdl/NandGate.v
# )

function(add_isim_test TEST_NAME)

 # Parse args
 xilinx_parse_args(OUTPUT TOP_LEVEL UCF DEVICE SOURCES ${ARGN})

 # Add the sim executable
 add_isim_executable(test${TEST_NAME}
  TOP_LEVEL
   ${TOP_LEVEL}
  SOURCES 
   ${SOURCES}
  )

 add_test(${TEST_NAME}
  "${CMAKE_CURRENT_BINARY_DIR}/runtest${TEST_NAME}.sh")
 set_property(TEST ${TEST_NAME} APPEND PROPERTY DEPENDS test${TEST_NAME})


endfunction()

########################################################################################################################
# Default flags for Xilinx toolchain

# Compiler flags
set(XST_MAX_FANOUT 100000)
set(XST_OPT_MODE Speed)
set(XST_OPT_LEVEL 1)
set(XST_KEEP_HIERARCHY No)
set(XST_NETLIST_HIERARCHY As_Optimized)
set(XST_RESOURCE_SHARING Yes)
set(XST_RAM_EXTRACT Yes)
set(XST_SHREG_MIN_SIZE 2)
set(XST_REGISTER_BALANCING No)

set(XILINX_FILTER_FILE FALSE)

########################################################################################################################
# Xilinx FPGA bitstream generation

function(add_fpga_target)
 
 # Parse args
 xilinx_parse_args(OUTFNAME TOP_LEVEL UCF DEVICE SOURCES ${ARGN})
 
 # Get base name without extension of the top-level module
 get_filename_component(TOPLEVEL_BASENAME ${TOP_LEVEL} NAME_WE )
 
 # Set the filter flag
 SET(XILINX_FILTER_FLAG "")
 if(XILINX_FILTER_FILE)
  SET(XILINX_FILTER_FLAG "-filter ${XILINX_FILTER_FILE}")
 ENDIF()
 
 # Write the .prj file
 set(PRJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.prj")
 file(WRITE ${PRJ_FILE} "verilog work \"${TOP_LEVEL}\"\n")
 foreach(f ${SOURCES})
  file(APPEND ${PRJ_FILE} "verilog work \"${f}\"\n")
 endforeach()
 file(APPEND ${PRJ_FILE} "verilog work \"${XILINX}/ISE_DS/ISE/verilog/src/glbl.v\"\n")
 
 # Create the XST input script
 set(XST_DIR "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_xst")
 file(MAKE_DIRECTORY ${XST_DIR})
 set(XST_TMPDIR "${XST_DIR}/projnav.tmp")
 file(MAKE_DIRECTORY ${XST_TMPDIR})
 set(XST_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.xst")
 set(XST_SYR_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.syr")
 file(WRITE  ${XST_SCRIPT_FILE} "set -tmpdir \"${XST_TMPDIR}\"\n")
 file(APPEND ${XST_SCRIPT_FILE} "set -xsthdpdir ${XST_DIR}\n")
 file(APPEND ${XST_SCRIPT_FILE} "run\n")
 file(APPEND ${XST_SCRIPT_FILE} "-ifn ${PRJ_FILE}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-ofn ${OUTFNAME}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-ofmt NGC\n")
 file(APPEND ${XST_SCRIPT_FILE} "-p ${DEVICE}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-top ${TOPLEVEL_BASENAME}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-slice_utilization_ratio 100\n")
 file(APPEND ${XST_SCRIPT_FILE} "-bram_utilization_ratio 100\n")
 file(APPEND ${XST_SCRIPT_FILE} "-dsp_utilization_ratio 100\n")
 file(APPEND ${XST_SCRIPT_FILE} "-bufg 16\n")
 file(APPEND ${XST_SCRIPT_FILE} "-hierarchy_separator /\n")
 file(APPEND ${XST_SCRIPT_FILE} "-bus_delimiter <>\n")
 file(APPEND ${XST_SCRIPT_FILE} "-case Maintain\n")
 file(APPEND ${XST_SCRIPT_FILE} "-max_fanout ${XST_MAX_FANOUT}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-opt_mode ${XST_OPT_MODE}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-opt_level ${XST_OPT_LEVEL}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-keep_hierarchy ${XST_KEEP_HIERARCHY}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-netlist_hierarchy ${XST_NETLIST_HIERARCHY}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-resource_sharing ${XST_RESOURCE_SHARING}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-ram_extract ${XST_RAM_EXTRACT}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-shreg_min_size ${XST_SHREG_MIN_SIZE}\n")
 file(APPEND ${XST_SCRIPT_FILE} "-register_balancing ${XST_REGISTER_BALANCING}\n")
 
 #-power NO
 #-iuc NO
 #-rtlview Yes
 #-glob_opt AllClockNets
 #-read_cores YES
 #-write_timing_constraints NO
 #-cross_clock_analysis NO
 #-lc Auto
 #-reduce_control_sets Auto
 #-fsm_extract YES -fsm_encoding Auto
 #-safe_implementation No
 #-fsm_style LUT
 #-ram_style Auto
 #-rom_extract Yes
 #-shreg_extract YES
 #-rom_style Auto
 #-auto_bram_packing NO
 #-async_to_sync NO
 #-use_dsp48 Auto
 #-iobuf YES
 #-register_duplication YES
 #-optimize_primitives NO
 #-use_clock_enable Auto
 #-use_sync_set Auto
 #-use_sync_reset Auto
 #-iob Auto
 #-equivalent_register_removal YES
 #-slice_utilization_ratio_maxmargin 5

 # Create the run-XST script
 set(XST_BUILD_LOG "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_xst.log")
 set(XST_RUN_SCRIPT "${CMAKE_CURRENT_BINARY_DIR}/runXST_${OUTFNAME}.sh")
 file(WRITE  ${XST_RUN_SCRIPT} "#!/bin/bash\n")
 file(APPEND ${XST_RUN_SCRIPT} "${XST} -intstyle xflow ${XILINX_FILTER_FLAG} -ifn ${XST_SCRIPT_FILE} -ofn ${XST_SYR_FILE} > ${XST_BUILD_LOG}\n")
 file(APPEND ${XST_RUN_SCRIPT} "if [ \"$?\" != \"0\" ]; then\n")
 file(APPEND ${XST_RUN_SCRIPT} "    cat ${XST_BUILD_LOG} | grep \"ERROR\"\n")
 file(APPEND ${XST_RUN_SCRIPT} "    exit 1;\n")
 file(APPEND ${XST_RUN_SCRIPT} "fi\n")
 file(APPEND ${XST_RUN_SCRIPT} "cat ${XST_SYR_FILE} | grep \"WARNING\"\n")
 file(APPEND ${XST_RUN_SCRIPT} "exit 0;\n")
 execute_process(COMMAND chmod +x ${XST_RUN_SCRIPT})
  
 # Synthesize
 set(NGC_FILE "${OUTFNAME}.ngc")
 add_custom_command(
  OUTPUT ${NGC_FILE}
  COMMAND ${XST_RUN_SCRIPT}
  DEPENDS ${SOURCES} ${TOP_LEVEL} ${UCF}
  COMMENT "Synthesizing NGC object ${NGC_FILE}"
 )
 
 # Create the run-NGDBUILD script
 set(NGDBUILD_LOG "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_ngdbuild.log")
 set(NGDBUILD_RUN_SCRIPT "${CMAKE_CURRENT_BINARY_DIR}/runNGDBUILD_${OUTFNAME}.sh")
 set(NGDBUILD_BLD_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.bld")
 file(WRITE  ${NGDBUILD_RUN_SCRIPT} "#!/bin/bash\n")
 file(APPEND ${NGDBUILD_RUN_SCRIPT} "${NGDBUILD} -intstyle ise ${XILINX_FILTER_FLAG} -dd _ngo -nt timestamp -uc ${UCF} -p ${DEVICE} ${NGC_FILE} ${NGD_FILE} > ${NGDBUILD_LOG}\n")
 file(APPEND ${NGDBUILD_RUN_SCRIPT} "if [ \"$?\" != \"0\" ]; then\n")
 file(APPEND ${NGDBUILD_RUN_SCRIPT} "    cat ${NGDBUILD_LOG} | grep \"ERROR\"\n")
 file(APPEND ${NGDBUILD_RUN_SCRIPT} "    exit 1;\n")
 file(APPEND ${NGDBUILD_RUN_SCRIPT} "fi\n")
 file(APPEND ${NGDBUILD_RUN_SCRIPT} "cat ${NGDBUILD_BLD_FILE} | grep \"WARNING\"\n")
 file(APPEND ${NGDBUILD_RUN_SCRIPT} "exit 0;\n")
 execute_process(COMMAND chmod +x ${NGDBUILD_RUN_SCRIPT})
 
 # Translate
 set(NGD_FILE "${OUTFNAME}.ngd")
 set(PCF_FILE "${OUTFNAME}.pcf")
 add_custom_command(
  OUTPUT ${NGD_FILE}
  COMMAND ${NGDBUILD_RUN_SCRIPT}
  DEPENDS ${UCF} ${NGC_FILE}
  COMMENT "Translating NGD object ${NGD_FILE}"
 )
 
 # Create the run-MAP script
 set(MAP_NCD_FILE "${OUTFNAME}_map.ncd")
 set(MAP_LOG "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_map.log")
 set(MAP_MRP_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_map.mrp")
 set(MAP_RUN_SCRIPT "${CMAKE_CURRENT_BINARY_DIR}/runMAP_${OUTFNAME}.sh")
 file(WRITE  ${MAP_RUN_SCRIPT} "#!/bin/bash\n")
 file(APPEND ${MAP_RUN_SCRIPT} "${MAP} -intstyle ise -p ${DEVICE} -w ${XILINX_FILTER_FLAG} -logic_opt off -ol high -t 1 -xt 0 -register_duplication off -r 4 -global_opt off -mt 2 -ir off -pr off -lc off -power off -o ${MAP_NCD_FILE} ${NGD_FILE} ${PCF_FILE} > ${MAP_LOG}\n")
 file(APPEND ${MAP_RUN_SCRIPT} "if [ \"$?\" != \"0\" ]; then\n")
 file(APPEND ${MAP_RUN_SCRIPT} "    cat ${MAP_LOG} | grep \"ERROR\"\n")
 file(APPEND ${MAP_RUN_SCRIPT} "    exit 1;\n")
 file(APPEND ${MAP_RUN_SCRIPT} "fi\n")
 file(APPEND ${MAP_RUN_SCRIPT} "cat ${MAP_MRP_FILE} | grep \"WARNING\"\n")
 file(APPEND ${MAP_RUN_SCRIPT} "exit 0;\n")
 execute_process(COMMAND chmod +x ${MAP_RUN_SCRIPT})
 
 # Map
 add_custom_command(
  OUTPUT ${MAP_NCD_FILE}
  COMMAND ${MAP_RUN_SCRIPT}
  DEPENDS ${UCF} ${NGD_FILE}
  COMMENT "Mapping native circuit description ${MAP_NCD_FILE}"
 )
 
 # Create the run-PAR script
 set(NCD_FILE "${OUTFNAME}.ncd")
 set(PAR_LOG "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_par.log")
 set(PAR_RUN_SCRIPT "${CMAKE_CURRENT_BINARY_DIR}/runPAR_${OUTFNAME}.sh")
 set(PAR_PAR_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.par")
 file(WRITE  ${PAR_RUN_SCRIPT} "#!/bin/bash\n")
 file(APPEND ${PAR_RUN_SCRIPT} "${PAR} -w -intstyle ise ${XILINX_FILTER_FLAG}  -ol high -mt 4 ${MAP_NCD_FILE} ${NCD_FILE} ${PCF_FILE} > ${PAR_LOG}\n")
 file(APPEND ${PAR_RUN_SCRIPT} "if [ \"$?\" != \"0\" ]; then\n")
 file(APPEND ${PAR_RUN_SCRIPT} "    cat ${PAR_LOG} | grep \"ERROR\"\n")
 file(APPEND ${PAR_RUN_SCRIPT} "    exit 1;\n")
 file(APPEND ${PAR_RUN_SCRIPT} "fi\n")
 file(APPEND ${PAR_RUN_SCRIPT} "cat ${PAR_PAR_FILE} | grep \"WARNING\"\n")
 file(APPEND ${PAR_RUN_SCRIPT} "exit 0;\n")
 execute_process(COMMAND chmod +x ${PAR_RUN_SCRIPT})
 
 # PAR
 add_custom_command(
  OUTPUT ${NCD_FILE}
  COMMAND ${PAR_RUN_SCRIPT}
  DEPENDS ${UCF} ${MAP_NCD_FILE}
  COMMENT "Place and route native circuit description ${NCD_FILE}"
 )
 
 # Create the run-trce script
 set(TWX_FILE "${OUTFNAME}.twx")
 set(TWR_FILE "${OUTFNAME}.twr")
 set(TRCE_LOG "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_trce.log")
 set(TRCE_RUN_SCRIPT "${CMAKE_CURRENT_BINARY_DIR}/runTRCE_${OUTFNAME}.sh")
 file(WRITE  ${TRCE_RUN_SCRIPT} "#!/bin/bash\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "${TRCE} -intstyle ise -v 3 -s 2 -n 3 ${XILINX_FILTER_FLAG}  -fastpaths -xml ${TWX_FILE} ${NCD_FILE} -o ${TWR_FILE} ${PCF_FILE} -ucf ${UCF} > ${TRCE_LOG}\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "if [ \"$?\" != \"0\" ]; then\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "    cat ${TRCE_LOG} | grep \"ERROR\"\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "    exit 1;\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "fi\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "cat ${TWR_FILE} | grep \"0 timing errors detected\" > /dev/null\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "if [ \"$?\" != \"0\" ]; then\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "    cat ${TWR_FILE} | grep \"paths analyzed\"\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "    cat ${TWR_FILE} | grep \"timing errors detected\"\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "    cat ${TWR_FILE} | grep \"Minimum period is\"\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "    cat ${TWR_FILE} | grep \"Score\"\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "    exit 1;\n")
 file(APPEND ${TRCE_RUN_SCRIPT} "fi\n")
 execute_process(COMMAND chmod +x ${TRCE_RUN_SCRIPT})
 
 # TRCE
 add_custom_command(
  OUTPUT ${TWR_FILE}
  COMMAND ${TRCE_RUN_SCRIPT}
  DEPENDS ${UCF} ${NCD_FILE}
  COMMENT "Generate static timing analysis ${TWR_FILE}"
 )

 # Create the bitgen input script
 set(BITGEN_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.ut")
 set(BIT_FILE "${OUTFNAME}.bit")
 file(WRITE  ${BITGEN_SCRIPT_FILE} "-w\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g DebugBitstream:No\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g Binary:No\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g CRC:Enable\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g Reset_on_err:No\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g ConfigRate:2\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g ProgPin:PullUp\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g TckPin:PullUp\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g TdiPin:PullUp\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g TdoPin:PullUp\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g TmsPin:PullUp\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g UnusedPin:PullDown\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g UserID:0xFFFFFFFF\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g ExtMasterCclk_en:No\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g SPI_buswidth:1\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g TIMER_CFG:0xFFFF\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g multipin_wakeup:No\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g StartUpClk:CClk\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g DONE_cycle:4\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g GTS_cycle:5\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g GWE_cycle:6\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g LCK_cycle:NoWait\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g Security:None\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g DonePipe:Yes\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g DriveDone:Yes\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g en_sw_gsr:No\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g drive_awake:No\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g sw_clk:Startupclk\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g sw_gwe_cycle:5\n")
 file(APPEND ${BITGEN_SCRIPT_FILE} "-g sw_gts_cycle:4\n")
 
 # Create the run-bitgen script
 set(BITGEN_LOG "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_bitgen.log")
 set(BITGEN_RUN_SCRIPT "${CMAKE_CURRENT_BINARY_DIR}/runBITGEN_${OUTFNAME}.sh")
 set(BITGEN_BGN_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.bgn")
 file(WRITE  ${BITGEN_RUN_SCRIPT} "#!/bin/bash\n")
 file(APPEND ${BITGEN_RUN_SCRIPT} "${BITGEN} -intstyle ise ${XILINX_FILTER_FLAG} -f ${BITGEN_SCRIPT_FILE} ${NCD_FILE} > ${BITGEN_LOG}\n")
 file(APPEND ${BITGEN_RUN_SCRIPT} "if [ \"$?\" != \"0\" ]; then\n")
 file(APPEND ${BITGEN_RUN_SCRIPT} "    cat ${BITGEN_LOG} | grep \"ERROR\"\n")
 file(APPEND ${BITGEN_RUN_SCRIPT} "    exit 1;\n")
 file(APPEND ${BITGEN_RUN_SCRIPT} "fi\n")
 file(APPEND ${BITGEN_RUN_SCRIPT} "cat ${BITGEN_BGN_FILE} | grep \"WARNING\"\n")
 file(APPEND ${BITGEN_RUN_SCRIPT} "exit 0;\n")
 execute_process(COMMAND chmod +x ${BITGEN_RUN_SCRIPT})
 
 # BITGEN
 # Must depend on trce in order for timing failure to prevent bitgen from running
 add_custom_target(
  ${OUTFNAME} ALL
  COMMAND ${BITGEN_RUN_SCRIPT}
  DEPENDS ${NCD_FILE} ${TWR_FILE}
  COMMENT "Generate FPGA bitstream ${BIT_FILE}"
  SOURCES ${NCD_FILE} ${TWR_FILE}
 )
 
 # Add additional make-clean files
 # Do not delete run scripts or toolchain input files, only outputs
 set_property(
  DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
  APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES
  ${XST_SYR_FILE}
  ${XST_BUILD_LOG}
  ${NGDBUILD_LOG}
  ${NGDBUILD_BLD_FILE}
  ${PCF_FILE}
  ${MAP_LOG}
  ${MAP_MRP_FILE}
  ${PAR_LOG}
  ${PAR_PAR_FILE}
  ${TWX_FILE}
  ${TRCE_LOG}
  ${BITGEN_LOG}
  ${BIT_FILE}
  ${BITGEN_BGN_FILE}
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.lso"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.map"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_map.map"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_map.ngm"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_map.xrpt"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_ngdbuild.xrpt"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.ngm"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.pad"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_pad.csv"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_pad.txt"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_par.xrpt"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.ptwx"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_summary.xml"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.unroutes"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_usage.xml"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.xpi"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_xst.xrpt"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}_bitgen.xwbt"
  "${CMAKE_CURRENT_BINARY_DIR}/${OUTFNAME}.drc"
  "${CMAKE_CURRENT_BINARY_DIR}/usage_statistics_webtalk.html"
  "${CMAKE_CURRENT_BINARY_DIR}/webtalk.log"
  "${CMAKE_CURRENT_BINARY_DIR}/par_usage_statistics.html"
  )

endfunction()

# TODO: planAhead
#planAhead -ise yes -m64 -log planAhead.log -journal planAhead.jou -source pa.fromNcd.tcl

#pa.fromHdl.tcl (pre-synthesis)
#create_project -name lx9-lvds-ioexpander -dir "/home/azonenberg/native/programming/verilogpractice/lx9-lvds-ioexpander/planAhead_run_1" -part xc6slx9tqg144-2
#set_param project.pinAheadLayout yes
#set srcset [get_property srcset [current_run -impl]]
#set_property target_constrs_file "TopLevel.ucf" [current_fileset -constrset]
#set hdlfile [add_files [list {TopLevel.v}]]
#set_property file_type Verilog $hdlfile
#set_property library work $hdlfile
#set_property top TopLevel $srcset
#add_files [list {TopLevel.ucf}] -fileset [get_property constrset [current_run]]
#open_rtl_design -part xc6slx9tqg144-2

#pa.fromNcd.tcl contents (post-PAR)
#create_project -name lx9-lvds-ioexpander -dir "/home/azonenberg/native/programming/verilogpractice/lx9-lvds-ioexpander/planAhead_run_1" -part xc6slx9tqg144-2
#set srcset [get_property srcset [current_run -impl]]
#set_property design_mode GateLvl $srcset
#set_property edif_top_file "/home/azonenberg/native/programming/verilogpractice/lx9-lvds-ioexpander/TopLevel.ngc" [ get_property srcset [ current_run ] ]
#add_files -norecurse { {/home/azonenberg/native/programming/verilogpractice/lx9-lvds-ioexpander} }
#set_property target_constrs_file "TopLevel.ucf" [current_fileset -constrset]
#add_files [list {TopLevel.ucf}] -fileset [get_property constrset [current_run]]
#link_design
#read_xdl -file "/home/azonenberg/native/programming/verilogpractice/lx9-lvds-ioexpander/TopLevel.ncd"
#if {[catch {read_twx -name results_1 -file "/home/azonenberg/native/programming/verilogpractice/lx9-lvds-ioexpander/TopLevel.twx"} eInfo]} {
#   puts "WARNING: there was a problem importing \"/home/azonenberg/native/programming/verilogpractice/lx9-lvds-ioexpander/TopLevel.twx\": $eInfo"
#}

Sunday, December 2, 2012

CMake, CTest, and CDash for Xilinx FPGAs

Despite most of my posts lately having been on hardware topics, my PhD work (as well as my undergraduate degree) is in computer science. As a result, one of my many interests is applying software development methodologies to hardware, soft hardware, and firmware.

This goes well beyond the obvious stuff like using version control for layout files or firmware source code. I'm talking about stuff like continuous integration and test-driven development. While I do commit frequently and have some unit tests (not as many as I'd like) there is currently no formal methodology for nightly builds, running all of the tests each commit (right now they need to be run by hand in the simulator), or automatic regression testing.

I've also been getting increasingly fed up with Xilinx's IDE lately (and IDEs in general, but that's another story...) - the editor doesn't support regex search and replace, all of the toolbars and wizards make it way too complex to change one compiler flag, and generally it seems to make me less productive. Almost all of my "pure" software projects use CMake with makefile outputs; I develop in a standalone editor and then just "make" from a shell to compile the code.

This post documents my work to date on a CMake-based workflow for Xilinx devices. My hope is that I can eventually have everything completely integrated so that my firmware, FPGA bitstreams, and RTL simulation test cases can all be built with a single "make" command.

The first part of my script is still very hackish - there are way too many hard-coded paths and it assumes the 14.3 toolchain version on 64-bit Linux, but it works for now and, more importantly, provides a wrapper that all of my other CMake code can use without changing even if I improve the autodetection.


# Find /opt/Xilinx or similar
find_file(XILINX_PARENT NAMES Xilinx PATHS /opt)
if(XILINX_PARENT STREQUAL "XILINX_PARENT-NOTFOUND")
 message(FATAL_ERROR "No Xilinx toolchain installation found")
endif()

# Find /opt/Xilinx/VERSION
# TODO: Figure out a better way of doing this
find_file(XILINX NAMES 14.3 PATHS ${XILINX_PARENT})
if(XILINX STREQUAL "XILINX-NOTFOUND")
 message(FATAL_ERROR "No ISE 14.3 installation found")
endif()
message(STATUS "Found Xilinx toolchain... ${XILINX}")

# Set current OS architecture (TODO: autodetect)
set(XILINX_ARCH lin64)

# Find fuse
find_file(FUSE NAMES fuse PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/")
if(FUSE STREQUAL "FUSE-NOTFOUND")
 message(FATAL_ERROR "No Xilinx fuse installation found")
endif()
message(STATUS "Found Xilinx fuse... ${FUSE}")

The next step was a little helper for argument parsing. In most languages supported by CMake there is no concept of a "top level" source file - the compiler finds your main() function but the build system doesn't need to know which file it's in.

macro(xilinx_parse_args _top_level _sources)
 set(${_top_level} FALSE)
 set(${_sources})
 set(_found_sources FALSE)
 set(_found_top_level FALSE)
 foreach(arg ${ARGN})
  if(${arg} STREQUAL "TOP_LEVEL")
   set(_found_top_level TRUE)
  elseif(${arg} STREQUAL "SOURCES")
   set(_found_sources TRUE)
  elseif(${_found_sources})
   list(APPEND ${_sources} ${arg})
  elseif(${_found_top_level})
   if(${_top_level})
    message(FATAL_ERROR "Multiple top-level files specified in xilinx_parse_args")
   else()
    set(${_top_level} ${arg})    
   endif()
  else()
   message(FATAL_ERROR "Unrecognized command ${arg} in xilinx_parse_args")
  endif()
 endforeach()
endmacro()

Once this was working it was time to actually create a simulation executable. This is a bit more involved than one might think for a couple of reasons:
  • The Xilinx tools ship their own custom C/C++ runtime libraries which are generally not the same version as that used by the host system. If you source /opt/Xilinx/[VERSION]/ISE_DS/settings[32|64].sh then all of the tools work (and are added to your $PATH) but any application depending on the host's glibc version won't start!
  • As a result, CMake and CTest require the host's glibc (so you can't run the sim executable or it'll segfault) and the sim executable requires the Xilinx glibc. This means that CTest cannot run the sim executable directly.
  • ISim does not seem to provide any way of setting the exit code for a simulation. CTest expects a test case to return 0 on success and nonzero on failure.
I started out by creating a tcl script (pretty much an exact copy of the one generated by the GUI toolchain except for the exit call) to run the simulation. Right now the 1000ns run time is hard coded so your simulation must finish sooner than that or not all the test cases will run. I'm going to make this parameterizable in the future.

onerror {resume}
wave add /
run 1000 ns;
exit;

This script is then launched by an automatically generated bash script which runs the simulation and then looks at the log file. If your simulation ever issues a $DISPLAY with the text "FAIL" in it, the test case is considered a failure; otherwise it's marked a success. The intention is to have a bunch of test cases in the testbench printing out something like "SPI flash read test: [PASS|FAIL]"

#!/bin/bash
cd /home/azonenberg/native/programming/verilogpractice/UnitTest02/build/tests/testNandGate
source /opt/Xilinx/14.3/ISE_DS/settings64.sh
./testNandGate -tclbatch /home/azonenberg/native/programming/verilogpractice/UnitTest02/build/tests/testNandGate/testNandGate.tcl -intstyle silent -vcdfile testNandGate.vcd -vcdunit ps || exit 1
cat isim.log | grep -q FAIL
if [ "$?" != "1" ]; then
    exit 1;
fi

Gluing all of the necessary parts and code generation together, we get this:
function(add_isim_executable OUTPUT_FILE )
  
 # Parse args
 xilinx_parse_args(TOP_LEVEL SOURCES ${ARGN})
 
 # Get base name without extension of the top-level module
 get_filename_component(TOPLEVEL_BASENAME ${TOP_LEVEL} NAME_WE )
 
 # Write the .prj file
 set(PRJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}.prj")
 file(WRITE ${PRJ_FILE} "verilog work \"${TOP_LEVEL}\"\n")
 foreach(f ${SOURCES})
  file(APPEND ${PRJ_FILE} "verilog work \"${f}\"\n")
 endforeach()
 file(APPEND ${PRJ_FILE} "verilog work \"${XILINX}/ISE_DS/ISE/verilog/src/glbl.v\"\n")
 
 # Main compile rule
 # TODO: tweak this
 add_custom_target(
  ${OUTPUT_FILE} ALL
  COMMAND ${FUSE} ${FUSE_FLAGS} -o ${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE} -prj ${PRJ_FILE}
    work.${TOPLEVEL_BASENAME} work.glbl > ${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}_build.log
    2> ${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}_err.log
  DEPENDS ${SOURCES} ${TOP_LEVEL}
  COMMENT "Building ISim executable ${OUTPUT_FILE}..."
 )
 
 # Write the tcl script
 set(TCL_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}.tcl")
 file(WRITE ${TCL_FILE} "onerror {resume}\n")
 file(APPEND ${TCL_FILE} "wave add /\n")
 file(APPEND ${TCL_FILE} "run 1000 ns;\n")
 file(APPEND ${TCL_FILE} "exit;\n")
 
 # Write the run-test wrapper script
 set(TEST_WRAPPER "${CMAKE_CURRENT_BINARY_DIR}/run${OUTPUT_FILE}.sh")
 file(WRITE ${TEST_WRAPPER} "#!/bin/bash\n")
 file(APPEND ${TEST_WRAPPER} "cd ${CMAKE_CURRENT_BINARY_DIR}\n")
 file(APPEND ${TEST_WRAPPER} "source ${XILINX}/ISE_DS/settings64.sh\n")
 file(APPEND ${TEST_WRAPPER} "./${OUTPUT_FILE} -tclbatch ${TCL_FILE} -intstyle silent -vcdfile ${OUTPUT_FILE}.vcd -vcdunit ps || exit 1\n")
 file(APPEND ${TEST_WRAPPER} "cat isim.log | grep -q FAIL\n")
 file(APPEND ${TEST_WRAPPER} "if [ \"$?\" != \"1\" ]; then\n")
 file(APPEND ${TEST_WRAPPER} "    exit 1;\n")
 file(APPEND ${TEST_WRAPPER} "fi\n")
 add_custom_command(TARGET ${OUTPUT_FILE} POST_BUILD COMMAND chmod +x ${TEST_WRAPPER})
 
endfunction()

There are several issues with the system right now; the most notable is that the fuse command is run every build even if the source files haven't changed. I'm going to fix this in a future release; this is just a WIP.

The final piece of the puzzle is some glue to create the executable and a CTest test case that calls the bash script:

function(add_isim_test TEST_NAME)

 # Parse args
 xilinx_parse_args(TOP_LEVEL SOURCES ${ARGN})

 # Add the sim executable
 add_isim_executable(test${TEST_NAME}
  TOP_LEVEL
   ${TOP_LEVEL}
  SOURCES 
   ${SOURCES}
  )

 add_test(${TEST_NAME}
  "${CMAKE_CURRENT_BINARY_DIR}/runtest${TEST_NAME}.sh")
 set_property(TEST ${TEST_NAME} APPEND PROPERTY DEPENDS test${TEST_NAME})


endfunction()

This is the only external interface to the whole module for now; everything else is just internal helper routines. The intended use is as follows:
cmake_minimum_required(VERSION 2.8)
include(FindXilinx.cmake)
enable_testing()
include (CTest)
add_isim_test(NandGate
 TOP_LEVEL
  ${CMAKE_CURRENT_SOURCE_DIR}/testNandGate.v
 SOURCES 
  ${CMAKE_SOURCE_DIR}/hdl/NandGate.v
 )

The full code for FindXilinx.cmake as it stands is here for convenience:
########################################################################################################################
# @file FindXilinx.cmake
# @author Andrew D. Zonenberg
# @brief Xilinx ISE toolchain CMake module
########################################################################################################################

########################################################################################################################
# Autodetect Xilinx paths (very hacky for now)

# Find /opt/Xilinx or similar
find_file(XILINX_PARENT NAMES Xilinx PATHS /opt)
if(XILINX_PARENT STREQUAL "XILINX_PARENT-NOTFOUND")
 message(FATAL_ERROR "No Xilinx toolchain installation found")
endif()

# Find /opt/Xilinx/VERSION
# TODO: Figure out a better way of doing this
find_file(XILINX NAMES 14.3 PATHS ${XILINX_PARENT})
if(XILINX STREQUAL "XILINX-NOTFOUND")
 message(FATAL_ERROR "No ISE 14.3 installation found")
endif()
message(STATUS "Found Xilinx toolchain... ${XILINX}")

# Set current OS architecture (TODO: autodetect)
set(XILINX_ARCH lin64)

# Find fuse
find_file(FUSE NAMES fuse PATHS "${XILINX}/ISE_DS/ISE/bin/${XILINX_ARCH}/")
if(FUSE STREQUAL "FUSE-NOTFOUND")
 message(FATAL_ERROR "No Xilinx fuse installation found")
endif()
message(STATUS "Found Xilinx fuse... ${FUSE}")

########################################################################################################################
# Argument parsing helper

macro(xilinx_parse_args _top_level _sources)
 set(${_top_level} FALSE)
 set(${_sources})
 set(_found_sources FALSE)
 set(_found_top_level FALSE)
 foreach(arg ${ARGN})
  if(${arg} STREQUAL "TOP_LEVEL")
   set(_found_top_level TRUE)
  elseif(${arg} STREQUAL "SOURCES")
   set(_found_sources TRUE)
  elseif(${_found_sources})
   list(APPEND ${_sources} ${arg})
  elseif(${_found_top_level})
   if(${_top_level})
    message(FATAL_ERROR "Multiple top-level files specified in xilinx_parse_args")
   else()
    set(${_top_level} ${arg})    
   endif()
  else()
   message(FATAL_ERROR "Unrecognized command ${arg} in xilinx_parse_args")
  endif()
 endforeach()
endmacro()

########################################################################################################################
# ISim executable generation

function(add_isim_executable OUTPUT_FILE )
  
 # Parse args
 xilinx_parse_args(TOP_LEVEL SOURCES ${ARGN})
 
 # Get base name without extension of the top-level module
 get_filename_component(TOPLEVEL_BASENAME ${TOP_LEVEL} NAME_WE )
 
 # Write the .prj file
 set(PRJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}.prj")
 file(WRITE ${PRJ_FILE} "verilog work \"${TOP_LEVEL}\"\n")
 foreach(f ${SOURCES})
  file(APPEND ${PRJ_FILE} "verilog work \"${f}\"\n")
 endforeach()
 file(APPEND ${PRJ_FILE} "verilog work \"${XILINX}/ISE_DS/ISE/verilog/src/glbl.v\"\n")
 
 # Main compile rule
 # TODO: tweak this
 add_custom_target(
  ${OUTPUT_FILE} ALL
  COMMAND ${FUSE} ${FUSE_FLAGS} -o ${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE} -prj ${PRJ_FILE}
    work.${TOPLEVEL_BASENAME} work.glbl > ${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}_build.log
    2> ${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}_err.log
  DEPENDS ${SOURCES} ${TOP_LEVEL}
  COMMENT "Building ISim executable ${OUTPUT_FILE}..."
 )
 
 # Write the tcl script
 set(TCL_FILE "${CMAKE_CURRENT_BINARY_DIR}/${OUTPUT_FILE}.tcl")
 file(WRITE ${TCL_FILE} "onerror {resume}\n")
 file(APPEND ${TCL_FILE} "wave add /\n")
 file(APPEND ${TCL_FILE} "run 1000 ns;\n")
 file(APPEND ${TCL_FILE} "exit;\n")
 
 # Write the run-test wrapper script
 set(TEST_WRAPPER "${CMAKE_CURRENT_BINARY_DIR}/run${OUTPUT_FILE}.sh")
 file(WRITE ${TEST_WRAPPER} "#!/bin/bash\n")
 file(APPEND ${TEST_WRAPPER} "cd ${CMAKE_CURRENT_BINARY_DIR}\n")
 file(APPEND ${TEST_WRAPPER} "source ${XILINX}/ISE_DS/settings64.sh\n")
 file(APPEND ${TEST_WRAPPER} "./${OUTPUT_FILE} -tclbatch ${TCL_FILE} -intstyle silent -vcdfile ${OUTPUT_FILE}.vcd -vcdunit ps || exit 1\n")
 file(APPEND ${TEST_WRAPPER} "cat isim.log | grep -q FAIL\n")
 file(APPEND ${TEST_WRAPPER} "if [ \"$?\" != \"1\" ]; then\n")
 file(APPEND ${TEST_WRAPPER} "    exit 1;\n")
 file(APPEND ${TEST_WRAPPER} "fi\n")
 add_custom_command(TARGET ${OUTPUT_FILE} POST_BUILD COMMAND chmod +x ${TEST_WRAPPER})
 
endfunction()

########################################################################################################################
# Test generation
#
# Usage:
# add_isim_test(NandGate
# TOP_LEVEL
#  ${CMAKE_CURRENT_SOURCE_DIR}/testNandGate.v
# SOURCES 
#  ${CMAKE_SOURCE_DIR}/hdl/NandGate.v
# )

function(add_isim_test TEST_NAME)

 # Parse args
 xilinx_parse_args(TOP_LEVEL SOURCES ${ARGN})

 # Add the sim executable
 add_isim_executable(test${TEST_NAME}
  TOP_LEVEL
   ${TOP_LEVEL}
  SOURCES 
   ${SOURCES}
  )

 add_test(${TEST_NAME}
  "${CMAKE_CURRENT_BINARY_DIR}/runtest${TEST_NAME}.sh")
 set_property(TEST ${TEST_NAME} APPEND PROPERTY DEPENDS test${TEST_NAME})


endfunction()
Once I get it to a more stable state I'll probably set up a Google Code page but for now this is good enough. The code can be used under the same 3-clause BSD license as almost all of my other open source code.

Wednesday, October 17, 2012

Cross-sectioning setup


My roommate Rob has a Unimat multipurpose machine tool which had been sitting around for a long time not being used for anything since we already have a full sized mill and lathe in our apartment shop.

We came up with the bright idea a few months ago of using it as a cross-sectioning saw by setting it up in the lathe configuration and putting a Dremel abrasive cut-off disk in the chuck. We then put the milling table on the carriage and clamped the workpiece to that. The end result is a tiny little abrasive-bladed cutoff saw, as used in most of my previous PCB/BGA cross section photos.

Unimat tool set up for cross-sectioning
The PVC plumbing visible in the background is part of our workbench dust-collection setup. A shop vac (off the left side of the frame on the floor) sucks through a 2.5" pipe with a bunch of T connectors on it. Each T is then necked down to 1.25" and has a ball valve before going out to an overhanging arm which is press-fit (rather than solvent welding as used for the permanent parts) together and connected with screw unions in critical spots.

The end result is that we can position each arm directly over the location of the cut and turn on suction for that intake only.

Sunday, October 14, 2012

Dummy BGAs and failure analysis

I'm back...  it's been a while since my last post so I figured I'd write something. I spent most of the summer working on my PhD research and don't have anything ready to publish on that, but I'm now starting design work on a new development board that will tentatively use an Artix-7 FPGA in FGG484 package.

The chip is going to be quite expensive and will be on a six-layer board (also not cheap) so I decided to do some research to characterize my BGA process a bit better as well as improving yields once failure sources can be identified.

I began by designing two mating PCBs in FT[G]256 form factor (pictured below) and buying a jar of 250,000 SAC305 solder balls. The contact-chain pattern was structured such that every ball was electrically isolated from the ones immediately up, down, left, and right, and connected to those diagonally opposite via a leapfrog-zigzag pattern. The end result is two chains of 128 balls in series, so that any open circuit can be detected, which should be electrically isolated. All possible horizontal or vertical shorts would be detectable as a short between the two chains.

Dummy FTG256 component

Dummy FTG256 carrier board (probe pads at top and bottom cropped)

The next step once the boards came back from fab was to take one of the dummy components and ball it.

I began by smearing the board with sticky flux using a microfiber swab. I need to come up with a good way of depositing thin films of sticky flux (a stencil of some sort maybe? thin and spin coating) uniformly over a board... the amount pictured turned out to be too much.

Fluxing the board
The next step was to begin placing solder balls. Lacking a stencil I just used tweezers to place them one at a time. It took a while but as long as I won't be doing this very often I can't justify the cost.

Beginning to place solder balls
Close-up of placed solder balls before reflow
Since this was just a test I decided to reflow the first half of the board to see how it turned out.

A minute or so into the reflow profile it was obvious something was wrong - the solder balls were moving all over the place.

Drifting solder balls
Close-up of drifting solder balls
It appeared that as the gel-based flux liquified, thickness variations caused it to flow and take solder balls with it. Surface tension resulted in balls trying to cling to one another.

I removed some of the excess flux, repositioned the misaligned balls, and reflowed, then repeated for the rest of the balls. A few of the balls moved again and bridged together so I removed them with solder braid, re-fluxed, and reflowed again with new balls.

Another defect visible post-reflow. For some reason this ball never quite made contact with the pad. It seemed to be fine after reflow.
The entire dummy component after reflow (whitish residue was left by flux after cleaning)
After reflow I took a quick look at the board and everything seemed fine, there was a ball on each pad and nothing was shorting.

I then treated the resulting board as an FTG256 component and reflowed it to the carrier board using my standard profile.

The resulting assembly passed the "no shorts" test and the "continuity of chain 1" test but the other chain showed an open circuit. After sanding the soldermask off the back of the dummy component (in retrospect I should have left the vias open for easy probing) a binary search quickly determined that pads F8 and F10, which should have been connected, were not. At this point it wasn't known which of the two connections was open.

I then cross-sectioned the board several rows back from the F row to get a general look at how the reflow had gone. I made the cut slightly off parallel so that I could get a slice through some of the balls as well as seeing the dog-bones and vias.

After making the cut I de-fluxed with a high-pressure stream of 50% v/v acetone/IPA from a syringe.

Close-up of two balls showing saw marks (very quickly polished). The apparent void on the left-hand ball is actually diamond abrasive paste on top of the ball, not a solder defect.

Cross section of the board. Note that the cut is slightly off parallel to the balls; each ball is cut slightly higher than the one to its left and the right-hand two are not cut at all.
Everything looked good - the balls had clearly flowed around the sides of the NSMD pads and were showing good adhesion, none of them were distorted or anywhere near shorting, and there were no visible cracks or other defects.

I then made another cut just before the F row in hopes of locating the actual defect. I didn't even need to use the microscope to see something was wrong - there was no ball in the F10 position whatsoever!

Missing solder ball!
At this point I was quite confused because I knew that I had put a ball on every pad. I decided to polish a tiny bit closer and get some more images.

The pad on the carrier board (bottom) can be observed to still have the gold plating on it. There is no evidence of tinning whatsoever.
After seeing that the pad on the board was completely un-tinned and gold plated, it seemed that the ball had not adhered to the pad at all.

I then went back and looked at the post-balling picture of the board. What had originally escaped my notice was that ball F10 (origin at lower right, up six, left ten) was a lot smaller than the others. It's not clear what happened but I'm guessing that while removing shorted balls with braid I accidentally sucked some of the solder off that pad.

I'm not certain that this is the correct explanation yet but it fits the data well and is simple. I'll be doing several more dummy BGAs over the coming weeks to see how things turn out.

Monday, July 23, 2012

I've joined the CMOSfold team :)

I've been added as a contributing author to CMOSfold, the "weekly centerfold" of nude semiconductors run by my friend and colleague John McMaster. It contains a large number of brief overviews and top-metal photos of various chips, without much analysis.

I'll try and do a post every week or two there on a random chip I have in my personal collection. They may later be followed by more in-depth analysis here.

Saturday, July 21, 2012

Lab Tour, part 2 - Electronics Assembly/Test

This is the second post in my "Lab Tour" series. If you haven't read the first one, it's here.

As with last time I'll begin with an overview of the work area. It consists of two back-to-back workbenches that are typically used in tandem.

Assembly and test bench
The first bench is used for component placement prior to reflow soldering, as well as testing boards after assembly. A dedicated lab computer is located here for reading schematics and datasheets while working. Unfortunately it's not fast enough to run an FPGA toolchain but I intend to replace it with something that can do so in the future.

I have a grounded mat plus a wrist strap for working at this bench, grounded through the earth terminal of my benchtop power supply.

Close-up of test equipment
My only other piece of test equipment at the moment is my Rigol DS1102D 100MHz mixed-signal oscilloscope. I'm looking into getting a function generator at some point but much of my disposable income lately has been going into FPGAs and board fab so it'll have to wait a while!

Just off the right side of the frame is my cheap 10x/30x stereo inspection microscope from Premiere. It's proved invaluable for checking the quality of component placement and looking for shorts, as well as just providing a close-up view when manually applying solder paste or placing components.

Through-hole component inventory
I keep all of my SMT components in drawers of this bench, but the through-hole parts are too big so they have to go on top. This is the oldest part of my lab by far - I've had these organizers since I was 10 or 11 years old and many of the passive components date almost that far back.

Soldering bench
The second bench is used for hand soldering through-hole components, rework, and cleaning of boards.

Soldering equipment
All of my soldering equipment is made by Aoyue, a cheap Chinese clone (right down to the model numbers!) of Hakko designs. They've worked fine for me so far.

If you look closely at the full resolution frame you can see that the left-hand iron is labeled "SAC305 ONLY" and the right hand is labeled "LEAD ALLOYS ONLY". I try to avoid mixing solder alloys when I can, and rather than swapping tips it's easier to have two identical irons. The majority of my work is lead-free but occasionally I find it necessary to rework an older board using 63/37.

The hot air pencil is absolutely indispensable for SMT soldering. It allows me to reflow a single component during rework without putting the entire board in the oven, getting much nicer looking joints than I would if I used an iron as well as taking advantage of the self-aligning properties of the reflow process. It's also about the only way to remove a large QFP intact.

Just visible at the right side of the frame is an activated-charcoal solder fume extractor. While I do work in a large room with good ventilation, it's a lot harder to replace your lungs than a TSSOP so I prefer to err on the side of caution ;)

Solvent tray
I almost always use no-clean rosin fluxes but it's still handy to have a way of removing excess from a board, or cleaning dirt off a board that's been sitting around for a while. For this I have a selection of various solvents ranging from distilled water to acetone to alcohols, as well as water-based detergents from Alconox. I don't have an ultrasonic cleaner at the moment but I plan to add one to the wet bench in a few months.

Sunday, July 15, 2012

MEMS pressure sensor teardown - part 2

The sensor I studied in my last post was delivered to me in a partially disassembled state. After returning to the e-waste dumpster we were able to find a fully intact unit.

SiliconPr0n wiki page: http://siliconpr0n.org/archive/doku.php?id=honeywell_awm2100v

The part number is clearly visible, it's a Honeywell AWM2100V airflow sensor. Some of my analysis from earlier was a bit off - it turns out that there's two ports on the device and some of the resistors on the membrane are heaters. One of the resistive elements is driven with a constant power and the resistance of the other one is measured to determine the membrane's temperature. Given the power input and the temperature increase above ambient (compared to unheated regions of the die and board) one can compute the airflow rate.

I tore this one down to the bare board but no further, die/board photos from the other unit are in part 1 of the post.

Sensor on the PCB
The original unit was a multi-board Honeywell process control module containing this sensor, a solenoid valve, and a large number of through-hole ICs including a Z88 family microprocessor (which may be covered in a future post - I want to get it decapped but haven't had time to do so yet).

I removed the sensor from the board using hot air. It's a six-pin SIL package with a plastic case snapped around the sensor board.



Packaging of sensor after removing from board

After removing the snapped-on casing we're left with the ceramic sensor board and a hose fitting on top. Under the host fitting is the actual sensor die, studied in detail in the previous post.

Fully disassembled

Saturday, July 14, 2012

MEMS pressure sensor teardown - part 1


While dumpster diving the e-waste bins on campus, one of my roommates found a control board of some sort that had an unusual sensor on it. We decided to take a closer look.

Sensor board
The board substrate is ceramic, most likely alumina. There are three electrically conductive layers visible on the board - gray (first level metalization), black (thick-film resistors), and gold (second level metalization and bond pads). A bluish dielectric separates M1 and M2 at crossing points but is not present over the remainder of the board.

The brownish reside on the top of the board is adhesive residue from the encapsulation over the sensor die.
Sensor die

The sensor die is approximately 1600μm along each side and appears to be made from a <100> oriented silicon wafer. Two metal layers are visible - one of a resistive material (most likely polysilicon) and one of gold (used for bond pads). For the sake of discussion I will define the upper center pin to be pin 1.

The die consists of six resistors and is entirely passive, with no transistors whatsoever.

The active sensing element consists of two membranes made out of what appears to be silicon nitride. The membranes are suspended over a cavity defined by an anisotropic wet etch using a KOH or related chemistry.

There are a total of three resistors between the membranes, whose values presumably change as the membrane is stressed. Pins 1 and 2, as well as 3 and 4, are connected to thin zigzag resistors on the left and right membranes respectively. Pins 7 and 8 connect to another resistor which starts on the lower left of the upper membrane, loops around at the far side, and goes back to the lower left on the lower membrane.

In addition, there are three resistors on the silicon substrate - betweens pins 8 and 9, 7 and 6, and 6 and 5. While they are all bonded out to pads and would be easy to measure, I have not yet attempted to get resistance readings.

The resistor between pins 5 and 6 has very unusual geometry and is not the typical zigzag I would expect. There is no obvious reason for this pattern.

Saturday, July 7, 2012

Lab Tour, part 1 - Metrology Bench

Several people, upon seeing some of the photos taken during my work, have asked for more information about my lab setup. I've been posting so many photos taken through my microscopes lately that I seem to have forgotten to post any of them!

This is the first post in a series of several. My lab is divided up into a series of distinct work areas and I'll be doing a post or two about each.

First off, an overview of the space:
Overview of metrology bench
The 19-inch rack at the right of the bench holds the image capture workstation (a 2U server recently removed from my GPU cluster), a Cisco 2950 switch, and a 24-port patch panel.

Moving to the left, the next notable piece of equipment is the Wentworth Labs probing station.

Probing station
The probe station is equipped with a 4-inch vacuum chuck, but I have to tape samples down at the moment due to lack of a vacuum distribution system. (This is on my longer term projects list).

The microscope is a B&L Stereozoom 4, with magnifications adjustable from 7x to 120x in full stereo. Although the images are not quite as sharp as most of my other scopes at high magnification, the addition of depth perception is extremely useful for probing and other manipulative tasks. It lacks an epi-illuminator so a fiber optic lamp is positioned to the right side of it.

I currently have three Micromanipulator 110/210 micropositioners. They're the same except one is meant to go on the left side of the chuck and one goes on the right. They're intended for large targets (20μm range, I think) such as bond pads, and are not suitable for smaller structures.

For probes, I use pieces of tungsten wire electrochemically etched to fine points. I'm still working on optimizing this process and will likely do a post on it once I get something working better.

Left side of metrology bench
The left-hand instrument in this view is an AmScope metallurgical microscope equipped with 4, 10, 40, and 100x (oil) objectives. It was my first high-power microscope and was OK but not great; the stage flexes when panning and the focuser seems to drift slightly. Some chromatic aberration is visible at higher magnifications.

The AmScope has been my main lithography tool so far; I will probably be removing it from service once my 2-inch contact aligner is finished.

The right-hand tool is an Olympus metallurgical microscope equipped with 5, 10, 20, and 40x objectives and is capable of both brightfield and darkfield illumination. It's a mix of BH and BH2 series parts scavenged from ebay.

The Olympus is my primary imaging system, its one notable deficiency is that at the moment it does not have a 100x objective. I purchased a used NeoSPlan 100x objective recently but this is infinity corrected (unlike the Neo objectives currently on the turret) so some modifications to the scope will be necessary to use it.

Off to the left side of the bench are various slides, coverslips, and sample preparation supplies.

I have several other measuring instruments that are portable and go wherever in the lab they're needed, but decided to cover them here in keeping with the general theme of metrology:

Scales and calipers
The left-hand scale has a capacity of 200g and is graduated in tens of mg; the right hand one has a capacity of 20g and is graduated in mg.

The digital caliper in the upper left is graduated in 0.001 inch increments but has a fairly large range of measurement and can do both inside and outside dimensions.

The final instrument is the Mitutoyo digital micrometer in the upper right. It only has a range of 0-1 inch but is graduated in μm. In one test I was able to easily measure the thickness of the photoresist film on top of a printed circuit board.

Tuesday, July 3, 2012

Photobit PB-0100-5 teardown

Earlier today I was cleaning out a drawer in my lab and found a broken USB webcam. Before throwing it out I decided to desolder the sensor chip and have a look.

For those of you who aren't familiar with it, my friend John and I are the driving forces between the Silicon Pr0n project - a wiki dedicated to amassing knowledge about all things related to semiconductor RE. I haven't been doing as much work on it recently due to academic obligations but figured it was about time to post some more die photos!

Wiki page: http://siliconpr0n.org/archive/doku.php?id=azonenberg:photobit:pb0100

Map: http://siliconpr0n.org/map/photobit/pb-0100-5/neo5x/ 

Package shots after removing from the board:

Top view of sensor
Bottom view
The package is a ceramic LGA using gold ball bonding. I have so far made no attempt to remove the die from the package or delayer; all images were taken through the window on the front of the package.

Without even resorting to the microscope some structure is obvious:
  • The red and green area at the upper left of the die is the pixel array.
  • The remainder of the die is covered with a transparent blue material (which upon closer inspection looks exactly like the blue color filter in the pixels) to prevent photocurrents from messing up the control logic
  • The area below the sensor has a lot of fine detail and is irregular. It's probably an array of standard logic cells controlling the sensor readout.
  • The area to the right of the sensor looks very regular and is probably addressing logic, buffers, and the ADCs.
  • Several of the pins along the top and left edge have three bond wires instead of one. They're probably power/ground.
  • Not all bond pads are broken out to pins.
I made an imaging pass over the entire die with a 5x objective. Since I haven't had time to do a CNC mod on my microscope stage like John has, I have to move the stage and snap photos by hand. This makes high-magnification full-die imaging very time consuming so if I need that I'll usually send the chip to him for processing.

Without further ado here's the full-die image. Note that this is rotated 90 degrees clockwise from the package overview image so that the vendor logo is right side up.

Full-die image
Closer inspection reveals that the standard cell area at left has large spaces between rows of logic for interconnect, suggesting that this is a 2-metal design. As typical for 1999-era technology the metal layers are not planarized. Sub-pixels look to be about 5 μm across.

Random portion of the subpixel array
The bottom right of the die has the Photobit logo and copyright notice:

Vendor logo and copyright. Note probe scrub mark from wafer test on the upper right pad.
Right above the logo there were a bunch of ID markings from the individual masks. It's immediately obvious that several masks are not visible as there are gaps in the array. These are probably the implants.

Several metal layers are visible, along with at least one polysilicon and several whose purpose is not immediately obvious.

Mask ID markings
The most interesting feature observed was at the bottom left of the die - a little doodle of a panda bear snuck in by the layout engineer.

Mask art!

Sunday, July 1, 2012

BGA process notes

I've gotten a lot of requests recently to share some details on my BGA assembly process, so without further ado here it is!

The board in this example is a test vehicle with an 11x11 0.8mm XBGA footprint on it, being mounted with a PIC32MX engineering sample chip. This is the same board I used in my 0201 process test.

I deliberately put several unfilled vias in the pads to demonstrate why this is a bad idea. Keep reading for details!

0.8mm XBGA test vehicle. Black marker lines highlight the row of balls that will be used for the cross section.
Since I don't have in-house stencil capabilities and haven't gotten around to ordering professionally made ones, I do all of my BGAs with flux only. My preferred flux for this purpose is ChipQuik SMD291NL no-clean rosin tack flux.

BGA pads covered in flux
The next step is to position the BGA on top of the footprint. Well-made footprints (such as the 256-FTBGA that I use on most of my FPGA boards) have the silkscreen outline slightly larger than the chip. Unfortunately this one is the same size as the chip so it was very difficult to align properly. I tried my best but it was still a little off.

BGA on footprint
I then ran the board through the standard reflow profile in my toaster oven. It's a cheap Proctor-Silex oven purchased at WalMart for something like $25. There is no thermocouple or feedback circuit in it (I have a 120VAC rated relay and a thermocouple but have not hooked it up yet.)
  • Set to 90C for 3 minutes to preheat
  • Set to 150C for 1 minute for thermal soak.
  • Set to 210C for 1 minute for reflow. This results in a Tal of about 15 seconds.
  • Turn off oven, open door, and cool to ambient with room air
Note that these numbers are not intended to describe the actual temperatures reached by the board or the oven - they're just the numbers on the dial of my specific oven. I know for a fact that the peak temperature reached at the 210C setting is in excess of 220C because that's the melting point of SAC305 solder.

I've also heard of people using oven thermometers to calibrate their reflow ovens. One word of caution for those doing this - if your sensor has a significantly higher thermal mass than your board (such as a big metal oven thermometer) its temperature will lag behind that of the less-massive PCB by a significant amount. I know of at least one hobbyist who reached the thermal decomposition point of FR4 Tg170 (somewhere around 300C) when his thermometer showed only 260!

The best way to tell when reflow is complete on an un-calibrated oven like mine is to watch the solder melt. My paste changes from a glossy gray (full of volatile flux compounds) to matte gray (once most of the flux has boiled off) to shiny silver (after the solder melts); BGA balls turn from a dull metallic color to shiny silver at melting; the chip also sinks slightly as the balls flatten from the weight of the IC. This YouTube video (not from my lab) shows what a properly reflowing BGA looks like.

The test vehicle in the oven. Note scrap-grade 4-inch silicon wafer being used as "cookie sheet".
Since this was a test board with no actual circuitry on it, the next step was to prepare to cross-section it and look at how well the joints turned out.

Although the flux I used is no-clean (and I normally leave it in place on most of my boards) cross sections look nicer if there isn't *too* much flux in the way. Since I don't have an ultrasonic cleaner yet (I do plan to buy one in the near future) I just let it soak in a beaker of 70% isopropyl alcohol for a few minutes, shook around a bit, and wiped it dry.

PCB sitting in beaker of IPA in my fume hood. Although IPA isn't particularly dangerous as solvents go, I have a general policy of keeping all open solvent containers in the hood whenever possible.
Once the board was dry I cut it in half between the black lines with a Dremel and a cut-off wheel. My ShopVac-based dust control system works reasonably well, but I want to get a HEPA vac for this in the future.

After the rough cut I polished with 1200 grit sandpaper and wiped away the dust with a wet cloth. Upon looking under the microscope I saw that the failure I was hoping to demonstrate had indeed occurred - one of the balls had been sucked down into an uncapped via by capillary action, resulting in a complete lack of electrical contact. The ball at far right had been partially sucked into the via but the solder mask dam was big enough to keep it from going in all the way.

Cross section of PCB and BGA. Note solder-filled via in center and missing ball. The far-right via annular ring seems to have snagged on something during the cutting process and been ripped up off the board.

Looking to one side of the board it was clear that the balls without vias under them had reflowed properly and were reasonably well aligned.

The black material between the balls is not underfill, it's a paste-like material made of residual flux, FR4/molding compound dust, and little slivers of copper that were ground off by the sanding process. It looks like my defluxing process didn't work as well as I had hoped; I'm going to need ultrasound to do the job properly.

One very interesting and unexpected result was visible in this cross section - the next row of vias were visible through the FR4 laminate.

Three well-reflowed balls. Note vias in next row visible through laminate.
Before closing up the lab for the night I decided to take one last picture to show what an ENIG-finished via looks like. Since this was a higher magnification image I followed the sandpaper polish with 3μm diamond paste to get a better finish.

The layers visible in this image from bottom to top are FR4 (grayish), 1oz/35μm copper foil(copper) and what looks like about 10μm of nickel (yellow-gray).  The gold plating is too thin to see at this magnification.

Cross section of ENIG-finished via.

Friday, June 29, 2012

ISim bugs and introducing the RED TIN internal logic analyzer

Earlier this month I was working on a Spartan-6 based design using the on-chip DDR SDRAM controller (MCB) and ran into some problems.

For those of you not familiar with the MCB, it's a pipelined DDR/DDR2/DDR3 controller that exposes several (up to 2 full-duplex and 4 half-duplex, or 4 full-duplex) distinct ports to your RTL. Each port can be independently clocked and the order of operations between ports may be unpredictable due to this (a round-robin arbitration scheme is used under the hood after synchronization) but operations on a single port execute in FIFO order. My design only uses port 0 at the moment, 1/2/3 will probably be used for DMA at some point in the future.

Each port consists of three FIFOs, which are capable of independent clocking but in practice typically share the same clock:
  • Command FIFO, which specifies the address, opcode (read, write, read with auto precharge, write with auto precharge), and burst length (number of words to write)
  • Write data FIFO, which stores data to be written plus a byte write-enable mask
  • Read data FIFO, which stores data read from RAM
To issue a write, push one or more data words and masks onto the write data FIFO and then push a write command onto the command FIFO. To read, push a read command and then read from the read data FIFO as it becomes available.

The read and write FIFOs are each 128 words deep, enough to accommodate most needs. Unfortunately, the command FIFO is much shallower - 4 commands. 

In deeply pipelined designs that do a lot of random-access operations such as my softcore CPU this isn't nearly enough! It's only exacerbated by the poor performance of the quick-and-dirty direct-mapped write-through L1 cache I'm using (I plan to replace it with a 2-way set-associative write-back cache at some point).

The obvious fix was to put a bigger FIFO in front of the MCB. This didn't work for some reason and I found myself reading garbage from memory. It wasn't clear where the bug was from reading the code, so I tried to simulate my design in ISim and take a closer look.

Being a hobbyist operating on a grad student's stipend doesn't leave me a lot of cash to spend on software so I try to stick with open source, or at least freeware, tools to the extent possible. The >$3K-per-seat ISE Design Suite Logic Edition is obviously not affordable so I use the freeware ISE WebPack for all of my FPGA designs.

WebPack includes the "Lite Edition" of ISim instead of the full version. According to the ISim FAQ page:
There is only one limitation. When the user design + testbench exceeds 50,000 lines of HDL code, the simulator will start to derate the performance of the simulator for that invocation... The line count includes both testbench and source code lines. The Xilinx libraries are not included in the line count.

I figured I was fine... my line-count script showed 10,129 lines of Verilog in my design directory (not all of which was being used in the simulation) plus another ~15k in the generated MCB code - well below 50k. As soon as I fired up the simulator, though, it slowed to a crawl and displayed a nag screen in the console:
This is a limited version of the ISE Simulator. The current design has exceeded the design size limit for this version and the performance of the simulation will be derated. Please contact your nearest sales office at www.xilinx.com/company/contact.htm or visit the Xilinx on-line store at www.xilinx.com/onlinestore/design_resources.htm if interested in purchasing the full, unlimited version of this simulator.
A binary search of my RTL (commenting out half of the file at a time) revealed that the MCB primitive was causing the line count to be hit. Despite the FAQ claiming that the Xilinx libraries are not included in the count, the MCB's behavioral model clearly is included.

Since the derated simulation was unusably slow (the millisecond or two required for calibration to complete would have taken several days of wall-clock time, before even starting to simulate my test suite) this was clearly not a viable option.

I started to look around for alternatives. The DDR chip was being clocked at 80 MHz DDR'd to 160 MT/s, too fast for my 100 MHz oscilloscope/LA to be of any help. Since the RAM was dynamic I couldn't slow it down. I only had 20 GPIO channels on the board, not nearly enough to bring out all of the signals of interest for probing (not that this was really a problem as the LA only had 16 channels).

The best option at this point looked to be Xilinx's ChipScope internal logic analyzer, which supports programmable triggers and can store a large number (their documentation isn't clear on exactly how many) of channels of data to on-chip memory, then stream to a PC over JTAG or Ethernet (or possibly other IO standards too). Unfortunately ChipScope is not included in a license. My options were to spend $695 on a node-locked license, $850 on a floating license, or $3000+ on a full Logic Edition ISE license.

Not having that kind of cash, I realized I was out of easy options - I was going to have to write my own tool. I sat down and started coding on an internal logic analyzer of my own, dubbed RED TIN (two words picked randomly from a dictionary - I'm terrible at coming up with names!).

A week and 2,063 lines of code later I had an alpha version ready for testing at http://code.google.com/p/red-tin-logic-analyzer/. It consists of four parts:
  • RedTinLogicAnalyzer, a Verilog core which handles the core capture/buffering code (currently hard-coded to 128 channels and 512 samples, will be parameterizable later on).
  • RedTinUARTWrapper, a wrapper around RedTinLogicAnalyzer that implements the board-to-PC protocol (the intention is to eventually support JTAG and other interfaces as well).
  • The redtin application, a C++/gtkmm GUI application which functions as the control panel of the logic analyzer
  • A third-party waveform viewer that can read standard vcd-format files. gtkwave is hard-coded in the alpha build but I'll provide an interface for using other viewers later on.
The capture code is portable Verilog and should work on any FPGA, though I've only tested on Spartan-6. The PC-side application is cross-platform except for the UART code, which is Linux-specific (I haven't yet had time to write a portable wrapper around that).

Using the core is pretty simple - just insert it into the design, feed up to 128 inputs into the "din" input, supply a clock, and connect the UART pins up to top-level pins on the FPGA. Note the 50 bits of zeros at the end to pad the total width up to 128 bits.



Without further ado, here's a screenshot of it in action!


It's a little hard to see in the small view (I have a quad-monitor setup on my desk, this is the left half) but the left side shows the control panel application (with iMPACT in the background) and the right side is data captured from one of my boards.

After a bit of poking around I found my bug!

While I was pushing write commands onto the extended FIFO, I had somehow forgotten to do so with reads. This resulted in a race condition where reads could either be stepped on by writes or, even worse, occur before the write had committed!

The above screenshot shows a write (p0_cmd_instr = 3'b000) executing after a read (p0_cmd_instr = 3'b001) even though the write was issued first (compare rising edge of wr to rising edge of read_active). As a result the read data was complete garbage (32'hDFBDBFEF instead of 32'hFEEDFACE) because nothing had been written to that address yet.