Sunday, March 15, 2015

What is Information?

This is a very fundamental question that promises insight to all our intellectual activities and beyond.

The word information is ubiquitous and of course we know what it means. We would describe it as: something that reduces our uncertainty. Are we happy with suchlike descriptions? They use words, which again need clarification, which again can be decomposed into smaller parts. As long as you still can abstract away, you are not done.

Note

First you abstract until you can't abstract no more.

Then you synthesize.

Still, we need to start with the natural usage of the word. We talk about information in newspapers, in books, in sentences, in words.

What is involved?

a sender: a person, an animal, our environment,...
a message: written or spoken text, signs, ...
a receiver: again a person, a computer, a radio, ...

This is communication. Is Communication = Information? No, it's communication of information.

We need to decompose more: A person has a brain with a map of our world, with concepts. A message consists of words of a natural language, but it can also consist of signs, Chinese pictographic characters, hieroglyphs, bits or bytes....

These and other things we can think of, are all quite different. Still they must have something in common.

They are a multitude
A multitude of concepts in the brain
A multitude of brain cells, of synapses
A multitude of phonemes
A multitude of photoreceptors in the eye
A multitude of colors
A multitude of voltages in the radio receiver
A multitude of frequencies in the radio signal
A multitude of words or letters or signs in a message

A person as a sender consists of many parts, these parts all communicate as well.

What is the smallest part of a communication processes?

Selection from a multitude

Note

First a multitude, then the selection. That's evolution. Evolution is how all dynamic systems work.

This is how biological evolution works. Selection by environment.
Economy: supply/demand = creation of multitude; publicity, offers, comparison = selection
The brain: ideas = creating multitude = creativity; selection via comparison with sensory input and experience
Sciences: papers = multitude; research, studying = selection
Memetics: ideas communicated to others
Culture in societies
...

We have abstracted everything away and remain with a multitude. In mathematics the foundation is set theory. The set is what we so far called multitude.

Note

Where are we now? We have a

set of elements
a way to choose or select

One instance of choosing means to choose exactly one element. By selecting one element we de-select the others. The elements are exclusive.

Regarding choosing or selecting in relation to the set theory one might want to check out the axiom of choice.

With this exclusiveness added the best words I came up with are variable and value.

Note

set --> variable element --> value

Value and variable turn up everywhere: in programming, in physics, in statistics,..., because they are the building blocks of everything: They are information.

Note

building block for information = variable = set of exclusive values

Value and variable are quite ubiquitous and thus appropriate for something as fundamental as information.

Let's get quantitative: What is the smallest amount, we can choose from? The answer: 2. The important aspect quantitatively is only the size of the variable, also called cardinality. For a variable V I shall denote the size with |V|. It makes sense to use the smallest variable, the bit (B of size |B| = 2), as unit of measurement for information. Values of bigger variables can be mapped to combinations of bits, which are elements of the Cartesian product. n bits can create |Bⁿ| = 2ⁿ combinations.

Note

The information of V is how many bits do we need to select a value of V.

Because 2^log₂|V| = |V| we get

I_V = log₂|V|

Note

Information as a measure is a property of a variable, not of its values.

The occurrences of values of a variable need to be distinguished from the variable itself. They make another variable. When the focus is on the occurrences, one can use the word variate. Let's denote this variable with V. To select an occurrence of any value of V we need log|V| bits.

The frequency of occurrence is yet another variable. Here we look at p_i = |V_i| ⁄ |V|. ∑_ip_i = 1. Something like this is called a probability. The exclusiveness is essential to it. It is a measure.

Note

Add probability for or ∨, but only for exclusive values of the same variable, else in general p(a∨b) = p(a) + p(b) − p(a b) holds. Note that p(a∨b) = p(a b).

Multiply probability for and, but only for values belonging to independent variables else p(a b) = p(a)p(b|a) holds. See below.

One can select values of V, i.e. V_i, by first selecting an occurrence of any value (log|V|) and then subtract the excess selection done (log|V_i|): log|V| − log|V_i| = − logp_i. Via this selection path one can do with less bits to reference the total amount of occurrences: − ∑_i|V_i|logp_i ≤ |V|log|V|. Dividing both sides by |V| we get the average bits needed to select a value of V. It is called entropy (S).

S_V = − ∑_ip_ilogp_i S_V ≤ I_V

I_V is the upper limit of S_V. I_V can also be expressed with the S_V formula by using a uniform p_i = 1 ⁄ |V|.

I_V is the upper limit of S_V. Considering that a variable is always embedded in a context of other variables, this limit case vaguely can be interpreted as loss of structure.

Note

The entropy is the average number of bits needed to select a value of the variable.

The occurrences of values of a variable are due to the variable's environment (= context), i.e. its relation to other variables. This context is often unknown or too complex (= too much information) to describe. It might also be known and used to derive the frequency of occurrences of values (a priori probability).

Physical systems can be regarded as a huge set of variables.

There is even a deeper relation between between energy and information. Check out these links: - Physics of information - Statistical physics - Statistical thermodynamics

So far information was regarded as inherent to a variable (= a piori).

I continue with the perspective of a learning system, like humans and other animals or learning machines (computers).

A learning system needs to provide as much memory as there is in the observed system to fix one state.

But often the learning system is confronted with a system too complex (= too much information) to create a complete map of it. It doesn't see all the variables. Most of them stay hidden (latent variables). In this situation including the frequency of occurrences of values, i.e. the probability, is often the best thing one can do. The probability distribution is a result of hidden dependencies from hidden variables.

Or it is due to the limited information content of nature itself, i.e. due to quantum mechanics. With probability distributions quantum mechanics can continue to use the well known continuum mathematics (real numbers,...), although the major statement of quantum mechanics is, that there is no continuum.

In such a complex system it normally is not possible to select a single value of a variable. Then one resorts to a partial selection via a probability distribution for the variable. This method includes also the case of selecting one value via a dirac distribution, the entropy of which is 0. A larger entropy expresses an more imprecise selection or, when talking about predictions, more uncertainty about what to expect.

For an exactly fixated value of one variable we often can predict the value of another variable via a functional dependency.

Note

Functions are important, because there is no information needed to fix the value of the dependant variable.

With an imprecise selection and an imprecise relation, analytical expressions for dependencies are often not feasible. Then one resorts to conditinal probability: p(a|b) (probability of a given b), a ∈ A and b ∈ B.

It is

p(ab) = p(a)p(b|a) = p(b)p(a|b)

Note

p(a)p(a|b) does not make sense.

This is Bayes Theorem. It allows to derive p(b|a) from p(a|b), i.e. the inverse dependency (corresponding to inverse function in exact mathematics). The starting variable's distribution is the prior and the dependent variable's distribution is the posterior.

Based on theoretical reasoning and assumptions, i.e by creating a model (= finding variables and their dependency), one normally ends up using a predefined distribution, like the normal distribution. The parameters of such a distribution often need to be estimated based on measurements. There are two related methods to do this:

There is the fundamental principle of maximum entropy (ME). By maximizing the entropy one does not assume more or make a more precise selection than what is actually known.

Via a Bayesian inference one gets p(a|{b_i}) = p(a)(p({b_i}|a))/(p({b_i})) (p({b_i}|a) = ∏_ip(b_i|a)) for the (parameter) variable A and then one can maximize the entropy of this distribution and take the a ∈ A with maximum p(a|{b_i}). For a uniform p(a) this yields the same result as ...
Maximum likelihood (ML). It selects a ∈ A such that ∏_ip(b_i|a) becomes a maximum.

The product (∏) is an assumption that the observed b_i values belong to independent and identical distributions (i.i.d).

Here some links regarding ML and ME: 1, 2, 3.

For more on information and probability I recommend these great books:

Whether values are exclusive in the first place, i.e. are variables, depends on the context, i.e. on the values of other variables. An intelligent system can find out the exclusiveness by observation. Only then it can observe the frequency of values with respect to other values of the same variable.

Monday, February 23, 2015

Lenovo E145 with Archlinux

Because of the purportedly long battery live I recently purchased the Lenove E145 notebook as an occasional travel companion. I replaced the 500Gb HDD with a Crucial M550 SSD.

Archlinux Installation

Download archlinux iso.

Copy it to a USB stick

sudo dd bs=4M if=/media/sw/linux/archlinux-2015.02.01-dual.iso of=/dev/sdc && sync

Boot with the USB stick.

Then create the partitions:

gdisk /dev/sda

34-2047 bios boot (ef02)
+4G linux swap (8200)
rest linux filesystem (8300)

Make file system.

mkswap /dev/sda2
swapon /dev/sda2
mkfs.ext4 /dev/sda3

Note

Check for TRIM capability.

hdparm -I /dev/sda | grep TRIM

Positive. One should later on regularly do

fstrim /

The archlinux base system

mount /dev/sda3 /mnt
pacstrap /mnt base base-devel
genfstab -p /mnt >> /mnt/etc/fstab
arch-chroot /mnt

with basic Configure

echo E145 > /etc/hostname
hwclock --systohc --utc
ln -sf /usr/share/zoneinfo/Europe/Vienna /etc/localtime
#Uncomment the needed locales in /etc/locale.gen
locale-gen
echo LANG=en_US.UTF8 > /etc/locale.conf
#
#this has escape where capslock was
#/usr/share/kbd/keymaps/i386/qwerty/myus.map.gz
cat >> /etc/vconsole.conf << "EOF
KEYMAP=myus
FONT=lat2-14
EOF

Replacing keyboard with sd-vconsole in the HOOKS list produced strange glyphs in XTerm. I had to revert it later.

mkinitcpio -p linux #ignore the warnings
passwd

Install grub and reboot

pacman -S grub
#in /etc/default/grub: GRUB_CMDLINE_LINUX_DEFAULT="quiet resume=swap_partition"
grub-install --target=i386-pc --recheck /dev/sda
grub-mkconfig -o /boot/grub/grub.cfg
exit
umount -R /mnt
reboot

Add user

useradd -m -g users -G audio,lp,optical,storage,video,wheel,games,power,scanner -s /bin/bash <user>
passwd <user>

Check hardware and kernel modules:

lspci
lspci -k
lsmod

E145 lspci:

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Root Complex
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Kabini [Radeon HD 8240 / R3 Series]
00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 0
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:02.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 01)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 39)
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 39)
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 39)
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 39)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 3a)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller (rev 02)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 5
01:00.0 Network controller: Broadcom Corporation BCM43142 802.11b/g/n (rev 01)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)
04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5229 PCI Express Card Reader (rev 01)

Install yaourt from AUR using the Archlinux Build System (ABS):

pacman -S abs
#download yaourt 1.5 from AUR
cd ~/abs/yaourt
makepkg -s
pacman -U yaourt-1.5-1-any.pkg.tar.xz

Other utilities:

pacman -S gvim, sudo, git

Network

Ethernet works straight away.

Wireless

I had read that there are problems with broadcom-wl-dkms, but so far it worked for me:

yaourt -S broadcom-wl-dkms
depmod -a
modprobe wl

/etc/modprobe.d/broadcom-wl-dkms.conf:

blacklist b43
blacklist b43legacy
blacklist ssb
blacklist bcm43xx
blacklist brcm80211
blacklist brcmfmac
blacklist brcmsmac
blacklist bcma

/etc/systemd/system/network.service:

[Unit]
Description=Network Connectivity
Wants=network.target
Before=network.target
BindTo=sys-subsystem-net-devices-<nic>.device
After=sys-subsystem-net-devices-<nic>.device

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=ip link set dev <nic> up
ExecStart=ip addr add <ip address>/24 dev <nic> #CIDR notation
ExecStart=ip route add default via 192.168.1.1
ExecStop=ip addr flush dev <nic>
ExecStop=ip link set dev <nic> down

[Install]
WantedBy=multi-user.target

I use netctl and made a static setup for ethernet and a dynamic for wireless.

/etc/netctl/sen:

Description='static ethernet connection'
Interface=enp3s0
Connection=ethernet
IP=static
Address=('192.168.1.107/24')
Gateway='192.168.1.1'
DNS=('192.168.1.1')
ExcludeAuto=no

/etc/netctl/dwl:

Description='dhcp wireless home'
Interface=wlp1s0
Connection=wireless
Security=wpa
IP=dhcp
ESSID='whateveryouressid'
Key='whateveryourpassphrase'

wifi-menu -o can be used to scan for wifi and to generate a profile.

Audio

pacman -S alsa-utils
pacman -S mplayer
pacman -S pavucontrol

Alsa configuration files: ⁄ usr ⁄ share ⁄ alsa ⁄ alsa.conf includes ⁄ etc ⁄ asound.conf and ⁄ .asoundrc .

Commands for testing

#check loaded kernel modules
lsmod | grep snd
#check indices
cat /proc/asound/modules
#cards, devices, ...
cat /proc/asound/cards
aplay -l
aplay -L
arecord -l
arecord -L

#testing
arecord -r 48000 test.wav
aplay test.wav

and the alsa configuration file:

/etc/asound.conf:

pcm.dmixed {
    ipc_key 1025
    type dmix
    slave.pcm "hw:1,0"
}

#one called "dsnooped" for capturing
pcm.dsnooped {
    ipc_key 1027
    type dsnoop
    slave.pcm "hw:1,0"
}

#and this is the real magic
pcm.asymed {
    type asym
    playback.pcm "dmixed"
    capture.pcm "dsnooped"
}

#a quick plug plugin for above device to do the converting magic
pcm.pasymed {
    type plug
    slave.pcm "asymed"
}

pcm.!default {
    type plug
    slave.pcm "asymed"
}

defaults.ctl.!card 1;

#a ctl device to keep xmms happy
ctl.pasymed {
    type hw
    card 1
    device 0
}

#for aoss:
pcm.dsp0 {
    type plug
    slave.pcm "asymed"
}

ctl.mixer0 {
    type hw
    card 1
    device 0
}

pcm.jackplug {
    type plug
    slave { pcm "jack" }
}

pcm.jack {
    type jack
    playback_ports {
        0 alsa_pcm:playback_1
        1 alsa_pcm:playback_2
    }
    capture_ports {
        0 alsa_pcm:capture_1
        1 alsa_pcm:capture_2
    }
}

Somewhere I had found that position_fix would help for the crackling microphone.

/etc/modprobe.d/audio_powersave.conf:

options snd_hda_intel model=thinkpad position_fix=3 power_save=1

It maybe helps a little bit, but the actual way to tackle this is via alsamixer, F4, capture to maximum and the other two bars down to zero.

Video

pacman -S xorg-server
pacman -S xorg-init
pacman -S xorg-utils
pacman -S xorg-xmodmap #because I have .Xmodmap file

#video
pacman -S xf86-video-ati

Mouse

#mouse pad
pacman -S xf86-input-synaptics

Window Manager

I use xmonad, but the choices are many ...

pacman -S xmonad
pacman -S xmonad-contrib

Special Keys

The special keys of the E145 are turned on by default. So you have to press Fn to get to the Fx keys. Enter BIOS at boot and set them to legacy to have F1 through F12 directly.

The special keys are then available via Fn+Fx. They don't work out of the box, but are mapped to X events already. These X events can be mapped to commands via xbindkeys.

pacman -S xbindkeys

~/.xbindkeysrc:

"amixer -D pulse sset Master 5%+"
    XF86AudioRaiseVolume

"amixer -D pulse sset Master 5%-"
    XF86AudioLowerVolume

"amixer -D pulse set Master toggle"
    XF86AudioMute

"amixer -D pulse set Capture toggle"
    XF86AudioMicMute

"echo `cat /sys/class/backlight/radeon_bl0/brightness` -10 | bc | sudo tee /sys/class/backlight/radeon_bl0/brightness > /dev/null"
    XF86MonBrightnessDown

"echo `cat /sys/class/backlight/radeon_bl0/brightness` +10 | bc | sudo tee /sys/class/backlight/radeon_bl0/brightness > /dev/null"
    XF86MonBrightnessUp

#not mapped
#XF86Display , XF86AudioNext , XF86AudioPlay , XF86AudioPrev , XF86WebCam , XF86WLAN

Use xbindkeys -v for testing. xbindkeys is started via ~/.xinitrc:

if [ -d /etc/X11/xinit/xinitrc.d ]; then
  for f in /etc/X11/xinit/xinitrc.d/*; do
    [ -x "$f" ] && . "$f"
  done
  unset f
fi
setxkbmap -option terminate:ctrl_alt_bksp
xsetroot -cursor_name left_ptr
xmodmap ~/.Xmodmap
xbindkeys
exec xmonad

Note

Xbacklight didn't work and trying all the other tools would have meant learning about all of them and investigating problems for all of them, so in the above .xbindkeysrc I went for directly writing to /sys/.../brightness. To overcome the permission denied I did

visudo
#add line: <myuser> ALL=NOPASSWD:/usr/bin/tee,/sys/class/backlight/radeon_bl0/brightness

Fan

When I had turned on the E145 for the first time, I was disappointed about the fan noise. The fan started very frequently. Luckily this can be solved using the fancontrol service.

pacman -S lm_sensors
sensors
cat /sys/class/hwmon/hwmon0/device/temp1_input

/etc/modprobe.d/fancontrol.conf:

options thinkpad_acpi fan_control=1

The fancontrol configuration file can be generated by pwmconfig. I edited it afterwards to these values:

/etc/fancontrol:

INTERVAL=10
DEVPATH=hwmon0=devices/platform/thinkpad_hwmon
DEVNAME=hwmon0=thinkpad
FCTEMPS=hwmon0/device/pwm1=hwmon0/device/temp1_input
FCFANS= hwmon0/device/pwm1=hwmon0/device/fan1_input
MINTEMP=hwmon0/device/pwm1=70
MAXTEMP=hwmon0/device/pwm1=100
MINPWM=hwmon0/device/pwm1=30
MAXPWM=hwmon0/device/pwm1=255
MINSTART=hwmon0/device/pwm1=32
MINSTOP=hwmon0/device/pwm1=30

systemctl enable fancontrol
systemctl start fancontrol

Fancontrol stops after returning from sleep, i.e. after having had the lid closed. The following helps

/etc/systemd/system/sleep.target.wants/onsleep.service:

[Unit]
Description=sleep hook
Before=sleep.target
StopWhenUnneeded=yes

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStop=-/usr/bin/systemctl restart fancontrol

[Install]
WantedBy=sleep.target

Battery on command line.

pacman -S acpi
acpi

I use Oh-my-zsh and there I tweaked the kiwi theme.

~/.oh-my-zsh/themes/my.zsh-theme:

PROMPT='%{$fg_bold[green]%}┌[%{$fg_bold[cyan]%}%D %T%{$fg_bold[green]%}]-(%{$fg_bold[white]%}%2~%{$fg_bold[green]%})-$(git_prompt_info)$(svn_prompt_info)$(battery_pct_prompt)
└> % %{$reset_color%}'

ZSH_THEME_GIT_PROMPT_PREFIX="[%{$reset_color%}%{$fg[white]%}git:%{$fg_bold[white]%}"
ZSH_THEME_GIT_PROMPT_SUFFIX="%{$fg_bold[green]%}]-"

ZSH_THEME_SVN_PROMPT_PREFIX="[%{$reset_color%}%{$fg[white]%}svn:%{$fg_bold[white]%}/"
ZSH_THEME_SVN_PROMPT_SUFFIX="%{$fg_bold[green]%}]-"

~/.zshrc:

export ZSH=$HOME/.oh-my-zsh
export EDITOR=vim
test -r ~/.profile && source ~/.profile
bindkey -v

#my = kiwish changed to %D %T
export ZSH_THEME="my"
eval `dircolors ~/.dircolors`
plugins=(battery git mercurial vi-mode)
source $ZSH/oh-my-zsh.sh

Note

The battery installed is not supported by this system and will not charge. Please replace the battery with the correct lenovo battery for this system. Press the ESC key to continue.

On my model this message appeared on startup after a few boots. First I thought it was a wrong warning, but it turned out, the battery was completely discharged: the notebook went black. I removed and reattached the battery and plugged in the notebook and luckily the message didn't appear any more.

Suspend

Suspend to mem works (mem > /sys/power/state). When closing the lid this state is automatically entered.

Hibernate did not work for me.

The resume kernel option

GRUB_CMDLINE_LINUX_DEFAULT="quiet resume=70d8aac3-344d-434d-b607-e48447692734"

did not help here. The GUID is my swap partition.

I stopped investigating regarding this, because the default sleep is just fine for me.

Sunday, January 18, 2015

Vectors, Tensors and Matrices in the Kitchen

This is an example I made as a teacher. The idea was to take the vector and matrix away from the usual application in space to something more everyday: the kitchen. It is a humorous application of mathematical thinking to baking cakes. It touches the essentials of Linear algebra, without getting too formal.

It can also be found on my automatic exercise correcting, course building site called mamchecker based on google appengine. But here I have extended the idea a little bit.

Vectors, Tensors and Matrices in the Kitchen

Vectors

Let's talk about the ingredients of a cake, like eggs.

Eggs is a variable. We can take one egg, or two eggs, or three eggs,... The values are exclusive. We cannot take two eggs and three eggs for the same cake.

variable = set of exclusive values in a context

Three eggs. Three is the number. Egg gives meaning to the number. Egg is the unit. Eggs is a coordinate system (CS), that gives meaning to the numbers.

Three is the contravariant part. Egg, the unit, is the covariant part, of eggs. Note that we could take another unit, like sixpacks of eggs. Then the covariant part is bigger, but the contravariant part becomes smaller. This is already a coordinate transformation.

0.5 kg of flour. Another ingredient, another variable.

One variable is also called one-dimensional (1D) space. Two variables, from which we can choose independently, make up a two-dimensional space (2D), and so on.

Why introduce another word dimension and not just continue with variables? Actually, it's just to stick a little with conventional usage. Dimensions and variables are the same here.

A set of values from one or more variables is called a vector

if these values can be added independently, i.e. eggs with eggs, ...
and if they can be multiplied with numbers. I we add twice the same vector v we should be able to say 2v . Then all components of the vector are multiplied by 2.

The cake ingredients are a vector space over the real numbers ℝ .

Eggs, Flour, Butter, Sugar, ... they are not exclusive when baking a cake. Then they are not values of a variable, they are just a set. Let i be any of them and C be a cake. Cⁱ is the contravariant number value to the covariant unit C_i . i is the index that indicates, which ingredient.

C = CⁱC_i = ∑CⁱC_i

∑ : If there are two same indices, then these are summed by Einstein's convention.

This is like writing 'Ten Eggs and one unit of flour and ...`.

If we do that a lot, it becomes tiresome. We tend to make a table then. If we always make the same table, we remember the positions of eggs and flour, ... (position coding) and only write the numbers. A vector, though, is not just the numbers. It is the real thing, the cake recipe, in this example.

dot product

Eggs have nothing in common with flour, well, at least if we don't dig too deep, because they surely have carbon atoms in common. But for our kitchen thinking they have nothing in common. In mathematics this is orthogonal.

There is an operation between two vectors, that tells about how much they have in common: the inner product or dot product or skalar product. It's result is a scalar, i.e. a number. One denotes it by a dot in between, or by < Egg|Flour > . < Egg|Egg > = 1 , because they have all in common. < Egg|Flour > = 0 , because they have nothing in common (orthogonal).

In a cake recipe the units of ingredients are all orthogonal to each other: C_i⋅C_j = 0 , if j ≠ j . I used the dot notation here. They are also of unit 1, which makes them orthonormal.

The ingredients are the basis of the cake. The unit ingredients are the orthonormal basis vectors.

Though convenient it is not a necessity that the basis vectors have nothing in common, i.e. are orthogonal.

For an example let's move from the ingredients of cakes (ingredient vector space) to the cake vector space. Every cake is a unit to sell or bring to a party. And every cake is an ingredient vector, an independent choice from more variables. Cake A and cake B surely have ingredients in common. So these unit vectors in the cake vector space are not orthogonal to each other. The dot product is not 0.

AⁱA_i⋅B^jB_j = AⁱBⁱ ≠ 0

All the terms with i ≠ j are 0, because of the orthonormal basis. So they have been dropped after the first = . Then we have the usual formula for the dot product.

If A_i⋅B_j were not 0 for i ≠ j , we would have a A_i⋅B_j , called curvature tensor. As you can see it results from vectors, but cannot be added any more. That is the reason for the different name.

Coordinate Transformation

A vector in the cake vector space - How many of each kind of cake? Let's call it a cake assortment - can be transformed to the ingredient vector space by multiplying with a matrix. Every matrix column is the recipe of one kind of cake. The columns are the basis vectors of the cake space. By multiplying the assortment vector with the transformation matrix, we do a linear combination of the cake vectors.

A = A^jA_j = A^jC_jⁱC_i = CⁱC_i

The transformation matrix C_jⁱ says how much of the i ingredient cake j needs. j is the lower index and is a column. i is the upper index and it is a row. C_i is Egg, Flour, ... A^j is the amount of cake A_j in the assortment A . With A_j and C_i implicit, the transformation is:

Cⁱ = C_jⁱA^j

The summation over the j index is the matrix multiplication. It results in a number in each row telling about the total amount of one ingredient, like eggs.

On a matrix row there is an amount of an ingredient, like egg, for every kind of cake. This is the dot product < egg|cake_j > . In general, if we transform from a CS with basis vectors A_j to a CS with basis vectors C_i the transformation matrix is C_i⋅A_j , j being the columns.

A = A^jA_j = A^j < C_i|A_j > C_i

Inverse

If the number of i’s and the number of j’s are equal it is possible to find A^j from Cⁱ , the number of each kind of cake in the assortment, from the total amount of ingredients used.

In this example, though, the cake space and ingredient space do normally not have the same number of variables (number of variables = dimension). If we can have 10 ingredients and 3 kinds of cakes. Then the transformation matrix is 10x3 (10 rows, 3 columns). Such a m × n matrix with m ≠ n cannot be inverted, i.e. one cannot infer from the ingredients how many of each kind of cake are baked. Said differently: Not for every combination of ingredients there is a combination of cakes, which needs exactly this amount of ingredients.

Note

A non-square matrix can be pseudo-inverted, though: Moore-Penrose Pseudoinverse. For this example multiplying an ingredient vector with the pseudo-inverse would produce a cake vector, which minimizes unused quantities of ingredients (Method of Least Squares) or makes best use of the ingredients (Maximum Entropy Method).

If we change from one vector space to another with same dimensions, then we can get back to the starting one by multiplying with the inverse matrix (A^− 1 ). A^− 1 matrix multiplied by A gives the identity matrix I . Calculating the inverse means solving a system of linear equations.

A⋅x = y A^− 1A⋅x = A^− 1y Ix = A^− 1y x = A^− 1y

In order for the inverse to exist, in addition to being square, the columns/rows of the matrix must be linearly independent. If not, then that is like effectively being in a smaller matrix (rank of matrix). For the cake example this means that every kind of cake must have a different combination of ingredients, which is some extra information that distinguishes it from the others and that can be used to code something.

Note

A square matrix can be inverted, if columns (rows) cannot be expressed as linear combination of the others, i.e. the rank of the matrix is equal to its dimension.

One can calculate the inverse of a square Matrix by:

leaving out the ij cross and calculate the determinant = Minor M_ij
change the sign, if i + j is odd
transpose, i.e. mirror at the main diagonal (compare below: ij for A and ji for M )
divide everything by the determinant

Short:

(A^− 1)_ij = (1)/(det(A))( − 1)^i + jM_ji

Inverse of a 2x2 Matrix:

M_ij is the diagonally opposite number. Because of the transposing the numbers left bottom and right top (secondary diagonal) stay where they are, but the sign changes. At the main diagonal the numbers get swapped, but since i + j is even the sign does not change.

Main diagonal → swap, keep sign

Secondary diagonal → no swap, change sign

There are algorithms, though, that are more efficient to calculate the inverse. And, of course, we use a computer program to do such calculations.

Sunday, January 11, 2015

(Complex) Numbers

This is partly from my time as a teacher, when I made an automatic exercise correcting, course building site called mamchecker based on google appengine.

I go through the different number sets in a not so standard way and conclude with the complex numbers, which have always been fascinating to me, because somehow enigmatic, not obvious, and then because useful and a good example of mathematical thinking.

Natural Numbers (ℕ ) by themselves are a great abstraction. Forget about all qualities but the "how many", the cardinality.

Then, do we add apples or do we take them away? So the sign was invented and the Integers (ℤ ). We write -2, but since this linear direction is a concept by itself, we should write 2( − ) , that is 2 times ( − ) , like we write 2m or 2g . And when we want to say three things (how many, add or remove, what quantity), then we could write 2( − )g . We could also give ( − ) a name like s for subtract (remove) and then we should also have a for add. But to have special signs ( − , + ), not a letters, is a lot better. So these signs make the add and subtract operations part of the natural numbers, making them integers and operations. Instead of 2( − ) we write − 2 , but we should still think of it as two times subtracting. + 3 − 2 is a sequence of such operations, we could emphasize this by writing + 3, − 2 .

The same reasoning we can do with * (multiplying) and its opposite ⁄ (dividing). *2 and ⁄ 2 are elements of the rationals (ℚ ). *3 ⁄ 2*5 is a sequence of such operations (*3, ⁄ 2, *5 ). Most computer programs don't understand *3 ⁄ 2*5 unless the initial * or ⁄ is omitted. For + and − they do understand.

Addition and subtraction are two elements of a set, i.e. of something we can pick one exclusively at a time. I normally call such a thing a variable and the pick a value. This variable is often called after one value: Addition = { + , − } . Same for Multiplication = {*, ⁄ } .

When combining addition and multiplication in a sequence, we have also introduced a convention: multiplication before addition. This operator precedence rule allows us to write 2*3 + 2*4 in one expression instead of a = 2*3, b = 2*4, a + b .

To put multiplication first was a good choice. We have had multiplication above when we wrote 2( − ) to emphasize what − 2 means. There is something very basic about multiplication: Whenever there are two variables, that have nothing in common, that we can pick from independently, that are orthogonal, we simple pick values independently and place them one after the other. When we count the possibilities of combined picks we get |A|*|B| , where || is the cardinality, the information content. This is why we write ab for the area of an a × b rectangle and why physicists combine all kinds of variables via multiplication. These variables are extensive meaning any value of it is a set. E.g. the length of a rectanble side is the set of all its points or of all its unit stretches. I also call such variables quantities here.

Next come the irrational numbers (I ). For example no sequence of multiplication and division with natural numbers will produce the diagonal from the edge of a square (incommensurable, √(2) ), but one can get as close as wanted. This is how they are defined: An irrational is an infinite sequence of rationals. In this sense infinity is an irrational.

Irrationals and rationals make up the real numbers (ℝ ).

Several times numbers were augmented with new information. The same we do with to the real numbers to get to the complex numbers. In a real number we have the number, multiplication operation (* , ⁄ ) and the linear direction ( + , − ). We normally first determine the quantity by comparing to a unit (multiply e.g. meters) and then optionally add it (to 0 on default).

There is another concept, which we can introduce: orthogonality. This "variable" contains 1 (normally omitted) for same direction and i for orthogonal or turned by right angle π ⁄ 2 , counterclockwise by convention. Turning a second time gives i*i = − 1 . This solves x² + 1 = 0 , starting from which normally i is actually introduced. 2ia is 2a and orthogonal to the direction of a .

i is called imaginary unit and ℝ × {i} are called imaginary numbers. Along this reasoning one should have called them orthogonal numbers. By addition they are independent/orthogonal to the real numbers. Together with the real numbers they make the complex numbers (ℂ ) Orthogonal means that all combinations are possible, which corresponds to a 2-dimensional (2D) plane, the complex plane or Gauss number plane.

z = (a, b) = a + ib ∈ ℂ

a = Re(z) is the real part
b = Im(z) is the imaginary part

These numbers are like 2D-vectors: 2 orthogonal directions that can be added independently.

There are three representations

z = a + ib , i.e. via the components or
z = r(cosφ + isinφ) via modulus r and argument φ (angle, phase) in radiants.

Note that, by the trigonometric addition formulas, multiplication adds the angles, i.e. multiplication leads to addition. This gives a hint that there could be a representation that has the angle in the exponent. Developing sin and cos into a Taylor series and comparing with the e^x series leads to the Euler Formula:

e^iφ = cosφ + isinφ

z = re^iφ is the third way to represent complex numbers.
- φ = arg(z) is the argument (phase) of z.
- arg(yz) = arg(y) + arg(z)
- arg((y)/(z)) = arg(y) − arg(z)

About sin and cos we know that the period is 2π , therefore this is true for e^iφ . The nth root divides the period up to 2nπ to below 2π and so we have n different roots.

z^1 ⁄ n = r^1 ⁄ ne^{i(φ ⁄ n + 2kπ ⁄ n)}

More generally:

In ℂ every polynomial of degree n has exactly n roots (fundamental theorem of algebra), if one counts the multiplicity of roots. ℂ therefore is called algebraically closed.

This means that not only x² , but every polynomial maps the whole ℂ to the whole of ℂ . This closedness is important. All operations are reversible. One can calculate freely.

z = re^− iφ = a − ib is the complex conjugate of z. (zⁿ) = zⁿ . We know that there are two orthogonals. We've chosen i to be one of them, so − i is the other.

yz combines in itself dot product (Re(yz) = |y||z|cosΔφ ) and cross product (Im(yz) = |y||z|sinΔφ ). dot and cross are two aspects of orthogonality (=combinability). dot being 0 means there is no dependence, which is saying all combinations are possible, which means the combined variable is of maximum cardinality (area, cross), i.e. a rectangle. The conjugate is important, because it produces the delta angle between y and z . Angle again is another, i.e. third way of expressing dependence. With yz dot, cross and angle are beautifully combined.

|z| = √(zz) = √(a² + b²) = r is the absolute value (modulus) of z . Here the imaginary part, the cross has gone, because there is no area in between. And why the √() ? Multiplication brought us to the combined variable, but in this limit case of 0 angle any independent other variable has gone and thus we go back to the the length, the cardinality, the information of z alone.

Why is the pythagorean theorem in there. To understand this we need to understand the fundamentals of the pythagorean theorem. There are two orthogonal variables involved. Let's name their units j and k instead of 1 and i (m and s for length and time,...). z = z₁j + z₂k is one value of the combined variable. The very fundamental dot of y and z reduces to the dot between j and k via y⋅z = (y₁j + y₂k)⋅(z₁j + z₂k) = y₁z₁j⋅j + y₂z₂k⋅k + y₁z₂j⋅k + y₂z₁k⋅j . Now j and k are orthogonal, so j⋅k = k⋅j = 0 , j and j are the same, thus j⋅j = 1 . Same with k⋅k = 1 . What we get is y⋅z = y₁z₁ + y₂z₂ , which is the pythagorean theorem if y = z .

number times unit is a special team: the quantity and the quality. It is also called a vector space. If the number is complex, it is a vector space over complex numbers. A vector can also subsume independent quantities, making it of dimension n > 1. A complex number z already has orthogonality. In a 2D vector space of units j and k it should be ij = k , because there is no other dimension. Actually something like this is done with quaternions. But for a general 2D vector space each component with ℂ becomes 2D itself. So a nD vector space over ℂ becomes kind of 2nD dimensional. What ℂ does is to allow specifying: can be added or cannot be added. This is comparable to a boolean.

To get the orthogonality, the dot, of two elements of such a 2D space over ℂ we do (uj + vk)⋅(yj + zk) = u⋅y + v⋅z = uy + vz , i.e. we take the conjugates. We need because we know there is orthogonality in a complex number and we've seen above how to deal with it. In general with r, s ∈ ℂⁿ the orthogonality is the real part of r⋅s = ∑r_is_i .

Now, do we need the complex numbers? Yes. The main reason is that orthogonality is a very important concept and so complex numbers find application in many fields. They make notation and calculation so much easier.

One essential point is the comparison of two quantities regarding addability. If they are parallel they are addable, if they are orthogonal, they are not. To this end paramters that have influence on addability, can be mapped to an angle (phase), which with Euler's notation of complex numbers, become a parallel (addable) and a orthogonal (not addable) part.

Examples:

The time t of a vibration becomes φ = (2π)/(T)t or
the t , x positions of a wave become φ = (2π)/(λ)x + (2π)/(T)t .

Re(Ae^iφ) then represents the addable amplitude.