Personal Website

Let's exchange ideas here...

Personal Website

Let's exchange ideas here...

Unlimited possibilities are not suited to man; if they existed, his life would only dissolve in the boundless.

I Ching

Unlimited possibilities are not suited to man; if they existed, his life would only dissolve in the boundless.

I Ching

TECH STACK

React

GraphQL

Docker

redshift

Figma

TECH STACK

React

GraphQL

Docker

redshift

Figma

If you enjoy an intellectual discourse, let us have a friendly conversation. Please feel free to schedule a time that works best for you through my Calendly.

If you enjoy an intellectual discourse, let us have a friendly conversation. Please feel free to schedule a time that works best for you through my Calendly.

Unlimited possibilities are not suited to man; if they existed, his life would only dissolve in the boundless.

I Ching

ABOUT

Hey, thanks for visiting my page. Most of my work experiences have been in product engineering. In my opinion, the most rewarding aspect of my work is the journey of gathering evidence, occasionally getting lost, piecing together the clues and learning something new along the way. When I'm not working, you will likely find me conducting experiments in my kitchen, exploring the science of longevity. Let's connect and discuss how I can help bring your vision to life.

ABOUT

Hey, thanks for visiting my page. Most of my work experiences have been in product engineering. In my opinion, the most rewarding aspect of my work is the journey of gathering evidence, occasionally getting lost, piecing together the clues and learning something new along the way. When I'm not working, you will likely find me conducting experiments in my kitchen, exploring the science of longevity. Let's connect and discuss how I can help bring your vision to life.

TECH STACK

React

GraphQL

Docker

redshift

Figma

I implemented cross platform SDUI components in BizApp.

2021 - Present

Local Biz

Mobile, sdui, backend

I implemented message queue-based job management and endpoint logging for an EDR web application.

2019 - 2021

Security

Devops, security

I implemented a custom shapes selector and location search for a geospatial marketing platform.

2017 - 2018

Atom

Fullstack, data

If you enjoy an intellectual discourse, let us have a friendly conversation. Please feel free to schedule a time that works best for you through my Calendly.

Notes and reflections from "How AI decodes Human Health"

January 11, 2025

10 min read time

Below are my notes based on 3 presentations from "How AI decodes Human Health" at the University of Toronto.

1. Research vs Deployment

The research demonstrated limitations in clinical application of cardiac arrest classification based on 8 biosignals. RLHF was unfeasible in the ER environment due to temporal constraints on clinical staff. Additionally, the model's accuracy got worse for two main reasons:

  • Feedback amplification: biases are amplified as clinicians ignored their own judgements when results diverged.

  • Poor data quality: The model was trained on a dataset from 1999-present, introducing potential demographic skew.

2. General applications in Healthcare

  • EDRN optimization: Optimizing schedules of emergency department registered nurses.

  • Cancer patient prioritization: Identify and prioritize the most urgent cancer cases.

  • AI Medical scribes: Clinicians often can't type fast enough to capture all patient details, and missing key facts can lead to misdiagnosis.

  • Guidance for MIS (minimally invasive surgical) procedures: In laparoscopic cholecystectomy procedures, real-time AI visualization is used to delineate surgical margins, with green regions indicating viable excision zones and red indicating critical structures. Clinical outcomes demonstrate enhanced procedural success rates.

The speaker also mentioned physicians and AI make distinctly different types of errors. His theory suggests that when diagnoses differ, a hybrid approach yields optimal outcomes. If either the clinician or AI concludes True while the other concludes False, the final diagnosis should be True. The results have led to an overall 26% reduction in ER mortality rates.

3. Implementation of AI in Pathology

The speaker focused on the pathology of prostate cancer. He defined false negative as wrongly diagnosing malignant as benign and false positive as misdiagnosing benign as malignant. He believes false negatives are far more dangerous than false positives.

He describes complacency bias, where inexperienced clinicians might discard their judgment in favor of AI when it's wrong, swayed by its perceived superiority. He believes we're in the Gartner hype cycle's "Trough of Disillusionment" and, similar to the previous speaker, preferred the term "augmented intelligence" over "artificial intelligence."

Follow-up

During one of the talks, an audience member asked "Will doctors eventually be replaceable?" to which the speaker responded: "AI won't replace doctors, but doctors who use AI will replace those who don't."

So exactly how does AI's diagnostic errors differ from those of human clinicians?

I found answers in a paper recommended by one of the speakers (Raciti P, Sue J, Retamero JA, et al., 2023). The study shows that pathologists' performance can be affected by fatigue, confirmation bias, and visual acuity.

Types of Errors Observed Without AI (Pathologist Alone):

  • Missing small tumors: detecting small cancerous foci and well-differentiated tumors.

  • Misinterpreting benign mimics: misclassifying benign tissue (e.g., seminal vesicles, HGPIN, benign prostatic gland around a nerve) as cancerous.

Types of Errors Observed with AI (Pathologist + PaPr):

  • PaPr driven inaccuracies: While PaPr helped correct all previously incorrect diagnoses, 85% of initially correct diagnoses became incorrect.

  • Incorrect area of interest: In one case, while PaPr correctly classified a slide as cancerous, it misidentified the area of interest.

  • False positives: Although specificity improved, a few WSIs that were truly benign were still incorrectly flagged by both the AI and pathologist as being cancerous.

The key takeaway is that PaPr AI enhances pathologists' detection capabilities by highlighting easily missed tissue features. Sensitivity improved with a 70% reduction in detection errors, helping pathologists identify tumors they might have otherwise overlooked.

Reflections

  1. It's evident that there is a lack of shareable data across clinics. As we know, models require large amounts of high quality data to perform well. While this limitation stems from regulatory concerns, even with available anonymized datasets, data processing remains challenging.

    Can we train agents to use existing medical knowledge to simulate training data for specific diagnoses?

    Can at home IoT devices securely collect large amounts of personalized data?

  2. The integration of domain knowledge with statistical modeling through LLMs presents an interesting challenge. Experienced pathologists and PaPr interpret slides differently because of their distinct learning approaches—PaPr learns from examples, while pathologists rely on a systematic knowledge to deduce an outcome. Therefore:

    Can AI be trained to develop diagnostic CoT reasoning using medical school curriculum?

    If clinicians are willing to share their observations and reasoning, can we add multi modal human feedback to the training data?

    Would probabilistic diagnostic outcomes enhance clinical decision-making accuracy compared to purely binary classification methods?

  3. AI assistants could help patients communicate their symptoms more precisely using medical terminology.

  4. Optimization of resources in the ER.

In conclusion, I'm left with a lingering thought from House:

"It is in the nature of medicine that you're going to screw up!" — Dr. House

References

Raciti P, Sue J, Retamero JA, et al. Clinical Validation of Artificial Intelligence-Augmented Pathology Diagnosis Demonstrates Significant Gains in Diagnostic Accuracy in Prostate Cancer Detection. Arch Pathol Lab Med. 2023;147(10):1178-1185. doi:10.5858/arpa.2022-0066-OA

The prospect of loss

December 19, 2024

5 min read time

I encountered a phrase like "What's possible isn't always right" while listening to David and Goliath, and it led to some pondering. Although this seemed really like an intuition, yet too often, we spend countless hours down a rabbit hole pursuing a possible solution, yet the right solution may take a simpler but different path.

Take, for example, scraping Wikipedia pages with BeautifulSoup. While handling numerous tag edge cases is possible and fixing each one is oddly satisfactory, is it really the right approach? Had I known about SPARQL earlier, I wouldn't have wasted time trying to scrape Wikipedia.

So why do we persist with familiar but suboptimal solutions? Most literature points to cognitive biases like loss aversion, where we, as irrational beings, feel losses more strongly than gains even at the same magnitude. While I was comfortable with Python and saw a clear path forward, SPARQL presented uncertainties such as a whether it was able to handle my use case. So it really felt like I was embracing my inner human instincts and avoiding uncertainty (blah blah blah). But if we re-consider the situation from the loss perspective, it turns out to be quite the contrary. Here are the 2 options:

  1. 100% chance of spending t hours learning about SPARQL

  2. p% chance of spending d extra hours parsing html tags traditionally vs. using SPARQL

where d > t and p < 100%

Therefore option 2 was actually chosen because I was strangely risk seeking when facing losses. I preferred the possibility of having to spend d extra hours because p < 100%, which left a possibility of losing nothing at all. This explains that appearing risk averse can actually lead to taking larger risks, resulting in lower expected utility.

Two issues from building interactive visualization via AI code generation

December 16, 2024

5 min read time

Lately, one of the most repetitive tasks encountered at work is creating and maintaining project Gantt charts in spreadsheets. If only we could just upload a CSV containing JIRA tickets and have it automatically create a Gantt chart for us...

A secondary objective of this expedition is to evaluate AI's code generation capabilities. I used Perplexity Pro for this exercise. It was great at generating boilerplate code, but it often suggested simpler, brute force workarounds. For example, to fix axis misalignment, it will transform the x axis by a hardcoded amount of pixels. For subtler bugs, the proposed solutions often created a vicious cycle. Basically the experience reminded me a bit of the Mr.Bean painting restoration scene (start at 3:45):

Digressions aside, here are 2 memorable issues I encountered:

Issue 1: Axis labels don't automatically adjust when zooming in or out, which causes them to either overlap or leave empty spaces on the canvas

AI’s solution: add style transformation to scale the width of the component like so

style: {{width: {100*zoom}%}}

But the problem is the axis labels shrink as user zooms out, leaving empty spaces around the chart area. I first considered using style: overflow-x: True to automatically show extra labels. However, I realized it's impractical to have an infinitely sized axis that would populate with data no matter how far you zoom out. A more efficient solution would be to dynamically compute the axis labels based on the zoom level.

If we can establish a base date range for a base zoom level, we can then adjust the date range based on the zoom adjusted distance to the center date:

const centerDate = (start + (end-start)/2);
const adjRadius = baseRadius / zoom;
setDateRange({
  start: centerDate - adjRadius / 2,
  end: centerDate + adjRadius / 2
})

But how can we compute axis label count?

I sought inspiration from AI. I must admit it is an amazing research companion. A common proposed idea was establishing a tick count and step size to build an array. Even though it just couldn't tell me exactly what step size to use... but we just had to apply a bit of intuition:

start,
start+mps,
start+mps+mps,
...
end

start + mps * (tickCount - 1) = end
(then solve for mps, never knew algebra could be applied to real life)
mps = (end - start) / (tickCount - 1)
Issue 2: Taskbar start and end dates don't align with the axis labels, and this misalignment becomes more pronounced with zooming.

This was a bit trickier because I could not figure out how to get help from AI. Here are its suggestions:

The misalignment of the chart axis with the chart body when using absolute positioning with percentage width and left position can occur due to several reasons:

1. Rounding errors: Browsers may round percentage values differently for the axis and chart body, causing slight misalignments[6].

2. Parent container issues: If the parent container's dimensions are not explicitly set or are affected by other layout factors, it can lead to inconsistent positioning of child elements[6].

3. Inconsistent box-sizing: Different box-sizing values for the axis and chart body can cause misalignment due to how padding and borders are calculated[6].

4. Scrollable plot area: When using features like scrollablePlotView, the y-axis may render incorrectly, especially on smaller screens[1].

5. Default axis behavior: Some charting libraries have default behaviors that can cause misalignment, such as adding offsets to make labels more readable[2].

To address this issue:

1. Use relative positioning instead of absolute positioning for better responsiveness and alignment[6].

2. Implement a grid or flexbox layout system for more predictable positioning[6].

3. Ensure consistent box-sizing across all chart elements[6].

4. Check for any library-specific settings that might affect axis alignment, such as `scrollPositionX` for right-aligned y-axes[1].

5. Consider using built-in chart layout options provided by the library instead of custom positioning[5]

Back to old-fashioned debugging. I noticed that the misalignment grew larger as we zoomed in or out. This suggested a small initial error that was being propagated by the zoom factor. When inspecting elements at zoom=1 (the base case), there is an extra gap after the end date, even with correct start and end dates. This is why the taskbar computations were off. The previous axis label count calculation was wrong because it considered end as the last tick occupying element, but actually end should be invisible. The corrected formula should be:

mps = (end - start) / tickCount

One final observation is AI's bias for generating lengthy code, which is a bit hard to read sometimes. Nevertheless, it excels at handling tedious tasks while providing detailed documentation. Perhaps this will allow us to focus on more specific problems.

Default dict in python

January 7, 2020

10 min read time

Last week, while writing python code, I became deeply perplexed when my code repeatedly did not generate the desired results. After tracing it multiple times, I discovered an interesting behaviour of defaultdict.

Defaultdicts are python collections type which offer default values for dictionaries. This is supplied through the default_factory which generates the default value. From the docs:

dict subclass that calls a factory function to supply missing values

class collections.defaultdict(default_factory=None, /[, ...])

Without it, we can accomplish the same by using set_default() however the docs mention default dict as a faster approach

d.setdefault(k, 0)

Here's how I setup my defaultdict

d = defaultdict(int)

Now if I access a random k which does not exist in d, it returns None

>>> d.get(3)

But what if I try accessing it through [ operator

>>> d[3]
0

This is where the magic of defaultdict begins. because now when I try calling get. This is because the key exists in the dictionary:

>>> d.get(3)
0

This because if I inspect d, somewhere 3 and its default value was updated in the dict

>>> d
defaultdict(<class 'int'>, {3

So what happened? Looking deeper into the __getitem__ call for regular dictionaries:

d[key]

Return the item of d with key key. Raises a KeyError if key is not in the map.

If a subclass of dict defines a method __missing__() and key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call. No other operations or methods invoke __missing__(). If __missing__() is not defined, KeyError is raised. __missing__() must be a method; it cannot be an instance variable:

And looking back at defaultdict objects, __missing__ is called when default factory is provided through defaultdict's constructor:

If default_factory is not None, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.
This method is called by the __getitem__() method of the dict class when the requested key is not found; whatever it returns or raises is then returned or raised by __getitem__().

But it also mentions:

Note that __missing__() is not called for any operations besides __getitem__(). This means that get() will, like normal dictionaries, return None as a default rather than using default_factory.

And that explains why. Basically defaultdict.get() will not invoke the missing method which invokes default_factory whereas defaultdict.__getitem__() (or also known as d[k]) will invoke missing to fill the key. This explains the initial discrepancy between d.get(k) vs. d[k]. However it does not explain why after invoking d[k], d is mutated with k and its result from __missing__(k). So I decided to take a venture into python's source code and found the code in missing:

https://github.com/python/cpython/blob/3.7/Modules/_collectionsmodule.c#L2004
if (PyObject_SetItem((PyObject *)dd, key, value) < 0) {

So what happened is that actually this method calls __setitem__ with the default value (supplied by default factory

https://github.com/python/cpython/blob/3.7/Modules/_collectionsmodule.c#L2001
value = PyEval_CallObject(factory, NULL);

As we know from the docs, __setitem__ is called when an assignment occurs which actually mutates the dictionary. That is why when we access d[k], the underlying d is mutated with the default value of k.

Class attributes in Python

January 26, 2019

10 min read time

These days I have encountered some mysterious bugs while interacting with class attributes, and I thought it’s worthwhile to look further into it. Let’s start with some background on classes:

A class contains data field descriptions (or properties, fields, data members, or attributes). These are usually field types and names that will be associated with state variables at program run time; these state variables either belong to the class or specific instances of the class.

Class (computer programming), Wikipedia

So a class attribute is essentially an attribute belonging to a class. This means any changes made to the class attribute should be reflected in instances of the class. For example, operating_system would be the class attribute in the following class:

class Minion:
    operating_system = 'RHEL'

>> # create two instances of Minion
>> m1 = Minion()
>> m2 = Minion()
>>
>> # reassign class attribute
>> Minion.operating_system = 'Centos'
>>
>> m1.operating_system
>> 'Centos'
>> m2.operating_system 
>> 'Centos'

Here change to the class attribute is propagated on its instances. But in Python, we can actually reassign the class attribute from the specific instance:

>> m1.operating_system = 'Centos'
>>
>> m1.operating_system
>> 'Centos'
>> m2.operating_system
>> 'RHEL'

Notice how m2’s value has not changed. Well this is because when a class attribute is assigned by an instance, Python will add it to its instance namespace which “overrides” its value from the class namespace. However, m2 is still accessing the attribute from the class namespace because such value is missing in its instance namespace.

This gets more interesting with mutable objects. Suppose we wish to keep track of the priority of images on the minion:

class Minion:
    images = {}
    
    def add_os_image(self, os, priority):
        self.images[os] = priority

>> # create two instances of Minion
>> m1 = Minion()
>> m2 = Minion()
>>
>> m1.add_os_image('Centos': 3)
>> 
>> m1.images
>> {'Centos': 3}
>> m2.images
>> {'Centos': 3}
>> m2.add_os_image('RHEL': 1)
>> 
>> m3 = Minion()
>> m3.images
>> {'Centos': 3, 'RHEL': 1}

What’s happened is that class attribute changes (made from an instance) is propagated to ALL instances of the class!

Well this is because during the assignment of the class attribute, the attribute is passed in by reference to the instance namespace so all attributes are effectively referencing the same location in memory. Better to use an instance attribute here instead.

Does that mean mutable class attributes should never be used? IMHO there could be situations where one wishes to maintain information collected from class instances. For example: tracking all os images ever created on minion instances to monitor duplicates or just for bookkeeping purposes.


PROJECTS

PROJECTS

Auto Gantt Chart

Upload a CSV containing JIRA tickets and generate an interactive Gantt Chart which supports zoom controls, intuitive navigation, smart task grouping, and export options.

Peer to Peer Dispute Resolution

Civil Litigation Dispute platform. Enables plaintiff to create case, assign to defendant, state arguments through realtime messaging, view jury decision live.

Cooking Experience

This app attempts to convert rudimentary culinary instructions into immersive AirBnB cooking experiences featuring a virtual persona within an AR kitchen. The audio visualizer synchronizes with AI synthesized speech.

[Coming soon] Nula

Reflections on the latest breakthroughs in longevity research along with AI applications in healthcare.

  • Cooking Experience

    This app attempts to convert rudimentary culinary instructions into immersive AirBnB cooking experiences featuring a virtual persona within an AR kitchen. The audio visualizer synchronizes with AI synthesized speech.

  • Peer to Peer Dispute Resolution

    Civil Litigation Dispute platform. Enables plaintiff to create case, assign to defendant, state arguments through realtime messaging, view jury decision live.

  • Auto Gantt Chart

    Upload a CSV containing JIRA tickets and generate an interactive Gantt Chart which supports zoom controls, intuitive navigation, smart task grouping, and export options.

  • [Coming soon] nula

    Reflections on the latest breakthroughs in longevity research along with AI applications in healthcare.