Haakon's weblog

I got inspired by Simon Willison's concept of Things I've Learnt, so I started collecting things myself. Here I try to publish at least some of them.

Recent Things I've Learned

Using launchd and launchctl to Schedule Jobs on MacOS

I've been using a Mac since 2013 for several reasons, but I never had the need to schedule anything that couldn't be done by a simple cronjob. However, I have slowly been moving towards tech independence. The first thing I did was to move away from Gmail for important e-mails after having read one too many a horror story on Hackernews about someone who suddenly lost access to their Google account.

The second step was to start moving away from Dropbox. I now have a mix of physical harddrives and a storage block. The first step was to start backing up writings, pictures and documents that don't live in Git repositories. Anyway, Kagi led me to this article: https://www.maketecheasier.com/use-launchd-run-scripts-on-schedule-macos/. Let's see how well this works.

I ended up creating this job:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.user.rsyncbackup</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/haakon/scripts/rsync-backup.sh</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>20</integer>
        <key>Minute</key>
        <integer>0</integer>
    </dict>
    <key>StartMissedJobs</key>
    <true/>
</dict>
</plist>

I think it's the first time I wrote xml and not edited an existing piece of xml (which gives me flashback to parsing Powerpoint presentation XMLs in one of my first internships). After writing it I just added the job to launchctl with launchctl load ~/Library/LaunchAgents/com.user.mybackupscript.plist.

Adding Pandas DataFrames with Different Multiindices

I ran into a strange bug due to forgetting some subtle name differences between two data frames. The example below shows what happens. Instead of failing, result2 has three levels in its index while result1 has only two levels because the

import pandas as pd
import numpy as np

# Create the first DataFrame with a MultiIndex
index1 = pd.MultiIndex.from_tuples([('A', 1)], names=['letter', 'number'])
df1 = pd.DataFrame({'value': [1]}, index=index1)

# Create the second DataFrame with a different MultiIndex
index2 = pd.MultiIndex.from_tuples([('A', 1)], names=['letter', 'decimal'])
df2 = pd.DataFrame({'value': [4]}, index=index2)

index3 = pd.MultiIndex.from_tuples([('A', 2)], names=['letter', 'decimal'])
df3 = pd.DataFrame({'value': [4]}, index=index3)
# Add the two DataFrames together
result = df1 + df2
result2 = df1 + df3

print("DataFrame 1:")
print(df1.to_markdown())
print("Index levels:" , df1.index.names)
print("\nDataFrame 2:")
print(df2.to_markdown())
print("Index levels:" , df2.index.names)
print("\nDataFrame 3:")
print(df3.to_markdown())
print("Index levels:" , df3.index.names)
print("\nResult of adding DataFrame 1 and DataFrame 2:")
print(result.to_markdown())
print("Index levels:" , result.index.names)
print("\nResult of adding DataFrame 1 and DataFrame 3:")
print(result2.to_markdown())
print("Index levels:" , result2.index.names)
DataFrame 1:
|          |   value |
|:---------|--------:|
| ('A', 1) |       1 |

DataFrame 2:
|          |   value |
|:---------|--------:|
| ('A', 1) |       4 |

Result of adding DataFrame 1 and DataFrame 2:
|          |   value |
|:---------|--------:|
| ('A', 1) |       5 |
Index levels: ['letter', 'number']

Result of adding DataFrame 1 and DataFrame 3:
|             |   value |
|:------------|--------:|
| ('A', 1, 2) |       5 |
Index levels: ['letter', 'number', 'decimal']

So what happens it that the decimal level disappears in the first case, because it happens to have the same value as the number level in df1. However, in the second case they have different values so the decimal level is added to the index. The reason for this is that pandas doesn't care about the index names, only the values. This tripped me up a bit, but I can see that both choices make sense. Aligning indices based on values means less renaming is necessary, which makes it simpler to use and less need to worry about index names. The flip side is that you can end up with subtle bugs when a third index level is introduced.

Today I Learned: Deming's Red Bead Experiment

I just learned about Deming's Red Bead Experiment. The experiment simulates a business that produces white beads using a somewhat convoluted system (the linked page above describes it well). The idea behind is that individual worker's performance has no impact on the results that worker produces. However, due to randomness and other factors outside of the worker's control. This is simulates a modern business environment. However, good performance is rewarded with pay raises while bad performance is punished, for example by being put on probation. The point is that this is how many businesses are being run. Namely, employees are rewarded or punished due to the workings of the system they are placed within; not due to their performance.

For more:

All TILs