Hello, and welcome to my website! I currently research ways to safely integrate commodity multicore processors and GPUs into real-time systems, with a focus on autonomous vehicles. I'm presently interested in industry opportunities for summer 2020. I have substantial experience in real-time and embedded systems (C, C++, CUDA, MIPS, ARM, and x86 Assembly, Verilog, etc.) and full stack web development (HTML, JS, CSS, Angular, Java, Spring, C#, SQL Server, MySQL, etc.). Recently I've helped develop CNN pipelining approaches to double CV throughput, helped develop methods to certify SMT-using platforms per an FAA need, and developed changes to Linux's memory allocation systems to reduce 99th percentile job runtimes by over 20%. Outside of research, I also serve on UNC's Renewable Energy Special Projects Committee, am involved with FOCUS, and regularly compete in local hackathons. Check out my resume if you want more detail and feel free to reach out. My email is the single word 'hacker' at the domain 'unc.edu'.
Chosen as the best teaching assistant by faculty and students in the Computer Science Department at UNC. Setting a new record, nearly 30% of my 150-person class nominated me for this award. On an anonymous feedback form, one student says "You are a wonderful person who cares about his students! Thank you so much for adding some sunlight into the spring of my senior year!" And during the awards ceremony, the instructor opened her remarks by saying that "Josh saved the course for the students." Another student comments, "He was super understanding, helpful, knowledgable, and very kind. He always went out of his way to help students and his calm and helpful responses are always reliable and good. He shows a lot of great enthusiasm for the material and is a great person."
Built a platform to share vehicles using the security of the Ethereum blockchain as the trust basis instead of a physical entity. Also uses the APIs from Smartcar to allow unattended unlocking and ignition for shared vehicles to eliminate most overhead. Prototype implementation won awards from Coinbase, Smartcar, Contrary Capital, and HackDuke at HackDuke 2017.
Built a website to display energy usage data for every building at the University of North Carolina at Chapel Hill. Features include:
Prototype implementation won an award from Microsoft at HackNC 2014GitHub Repo Homepage (Note: changes to HTML shadow DOM have broken current deployment)
Managed a small team to build an online realtime multiplayer matchmaking lobby and ranking system. Features include:
Built a Node.js app using Websockets for realtime multi-user control of the Sphero robot
Built an application stack to gather long periods of continuous accelerometer readings
Abstract: "Existing models used in real-time scheduling are inadequate to take advantage of simultaneous multithreading(SMT), which has been shown to improve performance in many areas of computing, but has seen little applicationto real-time systems. The SMART task model, which allows for combining SMT and real time by accounting forthe variable task execution costs caused by SMT, is introduced, along with methods and conditions for schedulingSMT tasks under global earliest-deadline-first scheduling. The benefits of using SMT are demonstrated througha large-scale schedulability study in which we show that task systems with utilizations up to 30% larger thanwhat would be schedulable without SMT can be correctly scheduled."
Abstract: "Vision-based perception systems are crucial for profitable autonomous-driving vehicle products. High accuracy in such perception systems is being enabled by rapidly evolving convolution neural networks (CNNs). To achieve a better understanding of its surrounding environment, a vehicle must be provided with full coverage via multiple cameras. However, when processing multiple video streams, existing CNN frameworks often fail to provide enough inference performance, particularly on embedded hardware constrained by size, weight, and power limits. This paper presents the results of an industrial case study that was conducted to re-think the design of CNN software to better utilize available hardware resources. In this study, techniques such as parallelism, pipelining, and the merging of per-camera images into a single composite image were considered in the context of a Drive PX2 embedded hardware platform. The study identifies a combination of techniques that can be applied to increase throughput (number of simultaneous camera streams) without significantly increasing per-frame latency (camera to CNN output) or reducing per-stream accuracy."
Abstract: "NVIDIA's CUDA API has enabled GPUs to be used as computing accelerators across a wide range of applications. This has resulted in performance gains in many application domains, but the underlying GPU hardware and software are subject to many non-obvious pitfalls. This is particularly problematic for safety-critical systems, where worst-case behaviors must be taken into account. While such behaviors were not a key concern for earlier CUDA users, the usage of GPUs in autonomous vehicles has taken CUDA programs out of the sole domain of computer-vision and machine- learning experts and into safety-critical processing pipelines. Certification is necessary in this new domain, which is problematic because GPU software may have been developed without any regard for worst-case behaviors. Pitfalls when using CUDA in real-time autonomous systems can result from the lack of specifics in official documentation, and developers of GPU software not being aware of the implications of their design choices with regards to real-time requirements. This paper focuses on the particular challenges facing the real-time community when utilizing CUDA-enabled GPUs for autonomous applications, and best practices for applying real-time safety-critical principles."
Abstract: "Embedded systems augmented with graphics pro- cessing units (GPUs) are seeing increased use in safety-critical real-time systems such as autonomous vehicles. The current black-box and proprietary nature of these GPUs has made it difficult to determine their behavior in worst-case scenarios, threatening the safety of autonomous systems. In this work, we introduce a new automated validation framework to analyze GPU execution traces and determine if behavioral assumptions inferred from black-box experiments consistently match behav- ior of real-world devices. We find that the behaviors observed in prior work are consistent on a small scale, but the rules do not stretch to significantly older GPUs and struggle with complex GPU workloads."