Showing posts with label Computing. Show all posts
Showing posts with label Computing. Show all posts

Saturday, January 8, 2022

Cryptocurrencies as Money

Cryptocurrencies as Money

A free economy can be compared to statistical physics. There are many actors. Independent comparisons, information processing, produce local transactions, but level out into a collective valuation, collective prices, until a global market establishes in time (more or less stable) and space (over countries, continents and the planet), until the number of transaction are maximal, i.e. fulfill all needs in all remote corners of the planet. Maximum entropy is maximum information (processing) is maximum energy consumption. Limits/setbacks of growth are reached with energy limitation/exhaustion, but new sources, automation and finally AI help out.

Cryptocurrencies (short crypto, fungible and non-fungible tokens) are a very accessible interface and infrastructure for the bookkeeping of real assets by humans and/or trading bots. Cryptos can be the money of the future. What is missing yet, is the widespread adoption, which would make valuation statistical and thus stable.

Introduction

If you have a key to a lock (of a house), then you have access (to the house). The same key-lock principle is behind the private-public cryptographic key. Instead of a physical key-lock we deal with two sequences of bytes.

  • The private part is the "key"
  • The public part is the "lock" (but normally called "address")

In ECDSA keys, the public key can be generated from the private key. So by "key" mostly the private part is meant.

Having the key gives you ownership. Locks can be combined to produce shared ownership, where more keys are needed to unlock.

In general a token is a hash of some data. The hash of the public key(s) states ownership, either of an addable number (fungible) or of another token (non-fungible). The ownership can be redeemed using the private key(s), by creating a signature, that can be checked against the public key and thus verified by everybody.

In general a transaction has more inputs and more outputs.

Each node in the network has its own interpreter to check the signatures and do other tasks like executing smart contracts. This applies to EVM (Ethereum Virtual Machine), but also to bitcoin and its forks, like bitcoin-cash,

The private keys are stored in wallets. The wallet also cares to make new keys (HD Wallets).

The unspent transaction output (UTXO, Coin) is first created as the first transaction of a block (coinbase), as reward for mining the block. This numeric value is kept limited by network consensus thus can be used to temporarily replace other limited assets, i.e. it can function as money, as long as the network is online. To keep the network running, nodes are motivated to join and stay by block reward and fees.

A small change in data, and the hash is completely different. The network of miners accepts only blocks with a hash less than a certain number (difficulty). For that only (all) values nonce number is tried.

The difficulty avoids that one can provide all the blocks and thus have control over a chain.

Blocks are chained together by the previous block hash (hashPrevBlock). The blockchain forms a public decentralized ledger, secure, because it cannot be changed unless one is able to redo the difficulty of all blocks following the changed one and overtake the network.

The network nodes check that the difficulty is met and wouldn't accept a block otherwise. Nodes build on the longest chain.

The difficulty is adjusted regularly (every 14 days or 14*24*6=201614*24*6 = 2016 blocks for bitcoin) such that the network can produce no more than about one block per 10 minutes.

Money is a collective product. The consensus rules and validation are a collective product. The joint usage of the network are a collective product. Together they make a collective value, they make money (= fiat money).

Time = Value = Transaction

Value can mean:

  • an element of a variable: most elementary
  • a number: mostly the result of counting values in variables (information)
  • human valuation/pricing: this considers human needs
  • ethical value: considering human needs, but eluding pricing

The values of a variable are exclusive. The value implies the variable.

The values of the variable must occur for a variable to exist. Selection of values of the variable cycles. Every value takes time or better is a time step of the variable. One cycle is one variable, is one time.

Every variable is also an independent time.

Information is the number of values of a variable. As such information is a quantity characterizing a variable, not a value. But since variables are values for higher level, the information (extension) is an extensive value. It motivates addition and ultimately all other operations.

Most values consist of internal variables. Internal communication produces inertia, because a level has a more or less fixed speed of communication, with a maximum speed for the lowest level.

When one variable separates into more independent variables they also form independent times. The variables get out of sync without communication.

Independent times make their value combinations random. Such a system of independent variables forms an exponential number of value combinations. If SS is the number of identical and independent variables of size nn, nSn^S is the number of value combinations.

The information speed comes into play when comparing to another variable. Information/Information = time/time = information/time = energy (see earlier blog). While information is conserved, energy is not. The information stays constant even if it spreads to more variables, in the same level or vertically in the hierarchy. pV=ST+UpV = ST + U becomes pV=STpV=ST, if the internal information (energy) UU is ignored, because locked up, anyway, i.e. not spreading vertically.

A transaction is the movement of a packet to new coordinates (value = location = selection = owner). In physics, the packet is internal physical information, internal variables/times. In human economy, it is valuated/priced considering human needs. The transactions by themselves form values, a variable, in the observed level. The packet is transacted based on a price agreement. As the transaction is also with information/price (food, gas, fee, VAT), the internal value of the packet (rest mass) is lower.

  • Physics: For a variable to exist, the values must occur.
  • Economics: For a valuation/price to exist and persist, transactions must occur.

Note

  • A transaction is a time step of economy and it has a price.
  • Transactions are needed to maintain value.
  • The transactions of a product represent a cycle, a variable.
  • A product has inner cycles (those of components).
  • Products of higher level move slower and have a higher price (mass).
  • The economic pricing hierarchy builds on top of basic living cost.

Money, Pricing

In human economics, transaction of physical resources are associated with a (numerical) value through the valuation/pricing process, that takes into account the demand/need of a resource and their limited availability specific to a person or a group of people.

Valuation of a product is a comparison with other products. If one person would do that, it would create its own valuation scale. The major products an individual compares to are due to its basic needs: food, housing, clothing, ... To compare, the person simulates having the product. A product needs to be personally used to have personal value. As the person has limited time values (.e.g seconds per life), a person's total valuation is limited.

Individuals averaged over a large population, or better a large number of transactions, produces money.

We don't use gold coins any more, we are on the verge of not using paper bills any more, either. We are left with only numbers. But the numbers have a value through the trust in each other that they will redeem the number with same valuation. Like, if you helped me for a day, I give you a bill or text you a message, which remembers you, that I Owe yoU (IOU) a day of help, too.

We collect such IOU's, so we don't need to stash food ourselves, because others do it for us. We can redeem our IOU's, when we are hungry.

Money is collective trust in the promises made by others, by the society. The valuation of money rises and falls with honorable and trustworthy behavior.

Valuation varies between people, space and time. Traders calculate with the valuation of other people, and especially use the valuation differences between people (arbitrage). In order to exploit the valuation difference, the trader relies on secrecy:

  • that the valuation of one party stays unknown to the other party and
  • that the calculations leading to the price offered by the trader stays secret

Secrecy and trust do not go well together,

  • Valuation differences, i.e. lucrative business ideas, do not stay secret long, but attract competitors.
  • Companies are short-lived, if their products that don't live up to the promises.
  • Outright lies, fake it till you make it, regularly lead to gigantic crashes in the finances.

Secrecy exists, but it does actually not matter so much. Even without it there is division of labor (including mental work) due to the expertise necessary and the limited time of one to do all alone. Sharing information without limit, nowadays so easy, boosts the economy.

Traders are like Maxwell demons, like are biological cells, plants, herbivores, carnivores, ..., farmers, traders, engineers, businessmen, investors, ... They all process information in a successively higher level, and can have a positive energy balance from it. Energy is information/time, the higher the level, the slower. But the information packets matter. A scientist has a long curriculum on its shoulder, like a complex protein has a long chemical pathway.

An important criterion in valuation is the marginal profit/loss (MP=MRMCMP = MR-MC), i.e. profit change by one more/less customer, product or whatever other unit, because it tells in which direction to go to maximize profit.

All this comparison in an economy creates stable prices (more global prices in space as well as time).

The collective comparison produces a common currency. Although just a number, that currency is limited, because also input channels, e.g. via work, is compared to the same scale.

Pricing is not solely based on calculations or statistics, though. Also power hierarchies or human relations play a role. Sometimes prices can even be dictated.

Comparing is work and many people don't spend too much effort on it, also because the effort very quickly surpasses the value of the product. Sharing information, the rating of other people, reduces the effort considerably.

The scarcity (limited supply and demand) is an essential feature of money, just like of every other product.

Scarcity could be named stability of valuation in a statistical sense. It does not refer to one person or one product. It does not mean that an individual should suffer of scarcity. It just means that sudden collective changes of valuation through a change in trust or supply and demand brings some disruption, with winners and losers, and needs time to stabilize again.

For a (stable) valuation there need to be (many) transactions. Transactions need consensus of more people to use the currency. The currency needs to be well distributed over a large basis of users to maximize transactions.

Money, despite varying prices, still represents real resources. In accounting, the real resources are assets, while the money is equity+liability. Assets = money. But it is a local assumption, because the pricing changes. There need to be regular currency adaptations.

The price can change because of more demand of a real resource (assets), but it can also change because the money supply changes.

A sudden change in money supply will change the demand on assets, which will change their prices. The same happens when the asset supply changes. Also both supplies can change. After one-sided changes it takes some time for prices to stabilize again.

If a money supply change reflects the resource/asset supply change, then the price stays stable.

Often there is one currency but many assets. But more generally there are different types of assets, as well as different types of currencies. One can make a currency per product. The currencies have their exchange rates. To compare, one needs to convert to one currency (valuation/pricing). One common currency stays relatively stable, because averaged over many transactions.

A countries legal tender is kept stable by adapting the supply,

  • either by issuing new money or
  • by buying up money of its currency

A central organization has control by issuing or withholding money. The control is exerted via parameters like interest rate. More money will be issued,

  • if the central bank interest is low
  • if the state's public spending is high

It is not just the political authority that control the money supply. Basically, those who own, do control. So centrally owned money means central control of money supply, and so indirectly central control of average pricing of products, i.e. the inflation.

General inflation is not just due to money supply, but also by the change in pricing of important products, which are ingredients of a large portion of all products, like energy and work force.

  • by central pricing agreements like that for work force
  • by change of taxes
  • by a change in supply, e.g. by deciding to get out of fossil energy supply
  • by a change in demand

Every product is its own currency. A currency is a product like every other. But a central currency is a special product, because it is more centrally controlled than any other product.

Central control would need a lot of information to make a good control. Normally central control is associated with inadequate reaction to changes.

A transaction needs a compromise between the parties. First, the compromise was quite local to a transaction and was done through bargaining. But with more bookkeeping and calculations, larger chains of transactions are taken into account. They lead to narrower price ranges of buyers and sellers. Transactions happen if the price ranges overlap. The bookkeeping and other kind of communication over space and time, like collective price agreements or dictation, make prices more global in space and time, i.e. more stable.

Many local independent decisions normally produce a better stable result via the law of large numbers than by central control. A globally used independent, not centrally controlled, exchange currency would become stable after some time and stay stable unless disruptive events occur.

Note

  • A currency is like every product.
  • Transactions (supply and demand) are needed for valuation/pricing, of money as well as of real assets.
  • Difference in valuation above fee produce transactions.
  • Many transactions produce a stable currency in the absense of disruption.

Traditional Money Compared to Crypto

Crypto has all the qualities of traditional money:

  • the paper bill number corresponds to a crypto key hash (number), but that bill/number is just the carrier of value and can be exchanged by another paper bill or crypto key (fungible)
  • like the paper bill, a crypto-key has a value associated to it
  • instead of putting the bill into your physical wallet, you put the crypto key into a digital wallet
  • the crypto key is the record of your belonging, like the paper bill you own
  • Your physical wallet or your bank account is your bookkeeping, just like the digital wallet is your bookkeeping. The wallet is like an account.

The role of money is to allow bookkeeping.

But for global/long-term bookkeeping, money needs to be stable, else one better considers it as an asset, a product.

Since crypto is not widely adopted yet, it is unstable, because not averaged over a large number of diverse transactions.

Wide adoption needs and produces stability.

Currently crypto is better considered an asset, like a physical product or like shares on a company.

Governments regard a crypto as an asset, like shares.

Shares do get quite independent from the company that issued them. Their price is rather dominated by supply and demand. Only occasionally good or bad news from the company change the behavior of traders. If the link to the company is removed then we basically are equivalent to a crypto, meaning that then both have no links to real assets other then through the valuation via supply and demand.

On the other hand many cryptos are driven by ads and influencers, with a company behind it that organizes that, and also controls the consensus centrally. This is very much like traditional shares.

Cryptos can replace traditional shares: Instead of issuing shares, a company can issue a crypto to finance itself.

  • The manufacturer can have its own fungible token to express the market valuation (EIP-20) of its products.
  • Or every product item can get its own non-fungible token (NFT, EIP-721, deed). It does not matter how the token is generated. It points to metadata via tokenURI that has more asset information. Ownership is not encoded in the token hash, but with separate addresses, like for fungible tokens.

Market

Market cap(italization) is coin supply times current price of one coin with respect to a FIAT currency.

Cryptos can be bought and sold in exchanges or privately.

The crypto's exchange rate, i.e. its price, depends on the limited supply and demand.

For the demand it must satisfy needs.

  • Provide a money infrastructure easily usable via smartphones (or other computers)
  • Keep the coin supply limited
  • Serve as an exchange currency between other currencies over time or space
  • Represent bookkeeping, possibly local for a product or a company
  • Trade and exploit valuation differences

For supply, block reward and fee keep the network running:

  • Crypto is created as reward for mining blocks: The coinbase is the first transaction of a block and it creates new output without input, i.e. new coin.
  • The output can be sold for other currencies, which gives the coin a price.
  • Transaction within the network do have a fee to account for the physical resources involved (electricity, computers) to reward the block miner and to avoid DoS attacks.
  • Fee burning reduces the supply more, when demand is more, thus working against inflation, and possibly producing deflation.
  • Buy back and burn by sending to an unusable address, is also used to reduce the supply.

All cryptos fulfill basically the same goal. That some are valued more than others is to some extend irrational speculation, to some extend limited support from wallets and crypto exchanges, to some extend lack of trust.

Currently, speculation is the major motive. This leads to unstable coins, if there are only big players, because big players decide slowly and keep a trend going, trying to drag others along and win from their movements. There are not enough independent actors to keep the coin stable.

A crypto cannot produce coin forever,

  • because computers work with limited width numbers
  • because any real resource is also limited
  • because a unique consensus does not cover all needs
  • because for scalability more networks are more efficient

Bitcoin, for example, reduces block subsidy gradually to 0. The assumption is that fee and valuation can keep the nodes online.

Scalability

The independent movements of a large population to fulfill their daily needs would make a crypto stable. That is the case for large fiat currencies.

No current crypto currency network can process that many transactions, therefore they rise the fee to keep away the masses.

Ethereum can process around 7-15 transactions per second, Bitcoin around 3-7. Second layer networks like Lightning for Bitcoin and Raiden for Ethereum, or sharding (partitioning of the database) are efforts to increase scalability, maintaining security and decentralization.

Second-layer networks reduce fees, because some communication is done off-chain.

Bitcoin has about 13000 listening nodes. A high node count produces more load for transactions, because every node needs to process them.

The fee is an important criterion to choose a crypto.

Exponential growth is a consequence of independent times/actors (Boltzmann statistics). Current exponential fees make the fee market "exponential-exponential". The fee rate should be constant. A fee competition between cryptos can help. But there is also the network competition for more hash power that asks for more reward.

Many different cryptos can be a remedy to the scalability problem. Each crypto can represent a local usage (can even be pegged to a local asset). The coins stabilize each other by exchange sites. Some exchange sites have a site-specific exchange coin as intermediary.

Trading bots can exploit valuation differences of various cryptos, level them out and thus produce a stable coin that can work as money.

Trust

A currency is an IOU. The amount of currency a person possesses, is a promise of society to redeem later with same assets.

A currency is stable if people trust in it, and they trust in it if it is stable.

You cannot trust anybody but the statistics of large number.

Individual decisions should not be made due to currency value, because it ruins statistics.

A Currency must be stable.

  • A deflationary currency is bad, because it postpones transactions, and loses the link to real economy
  • An inflationary currency is bad, because it prevents long-term planning.

Large fiat currencies are rather stable through the sheer amount of transactions. Stablecoin is normally pegged to to important fiat currencies like the Dollar (Tether), Euro or Yen.

Cryptos need to be trustworthy

  • the network needs to be reliable and stay online all the time
  • the link to real assets (NFT) must be correct
  • The way programming decisions are made, whether centralized or via enhancement proposal publicly scrutinized

Trading Bot

Stability is relative, though. Just as intermediary to an exchange, a short term stability is already enough. A bot can quickly react on changes, exploit them and produce stability, for people to use.

For a valuation to be stable its supply must change according to its demand. The bot can swap falling cryptos with rising ones, leveling them out. This swapping is the result of many bots buying low and selling high, but for them small amounts already matter.

Note

speculation on trends

The principle of speculation is to act before others and gain from others.

If one is first to buy in an upward trend, and first to sell in a downward trend, one earns most. If one is first in the game, one earns most.

  • By convincing others their behavior is a result and thus is of course later.
  • Otherwise one observes and anticipates the actions of others before they actually happen. Predictable behavior is always losing in speculation.

Buy when price is minimum, sell when price is maximum.

With slow competition:

  • buy, when the price starts to increase and
  • sell, when it starts to decrease

But, with fast competition, a minimum in local time, is already beyond the minimum, when the exchange serializes independent requests. Then

  • buying, when the price falls and
  • sell when the price rises

Fast bot competition produces so small and fast vibrations that the currency seems stable for the human eye.

Let's envision a future time where every person has its own avatar bot and their are additional bots in several levels. The ultimate demand is from humans, though. The avatar must see the human demand. For that, currencies must be pegged to real assets.

  • Let's assume a currency pegged to a local electricity power station (LOCTRO).
  • The demand increases locally in space and time, due to cold weather and electric heating.
  • The power station decides to increase the price of LOCTRO to gain on the demand.
  • A local consumer bot on electricity (BOTTRO) sees changes in LOCTRO. It exchanges LOCTRO for FARTRO (farther away power station).
  • The bot is fast and humans will actually see no change in price in BOTTRO. BOTTRO is a stable mix.
  • When all use more electricity, because suddenly everybody charges its electric car, a personal consumption avatar can swap BOTTRO's for other cryptos, telling the person to reduce electricity consumption, to use the bike instead of the EV.
  • Investors see BOTTRO increase, such that a larger local investments makes sense. They build a new power station and power storage.
  • After the investment has been payed off, competition makes BOTTRO fall and become stable again.

Bots can help stabilize local changes. Speculative human changes are local changes. Bots can help to merge the many cryptos into a stable global money.

Note

  • The role of money is to allow bookkeeping.
  • A crypto is like money, but the public ledger/network brings along the full infrastructure for bookkeeping.
  • More cryptos with (automatic) trading between them are a remedy to the scalability problem.

DEFI and DAO

DEFI: decentralized finance

DAO: decentralized autonomous organization

Cryptos are public ledgers. This does not yet make them decentralized finance, if the consensus rules are centrally dictated. Rather it also needs organizational decentralization that distribute control over the programming of the consensus rules.

The ledger only records transactions. For transactions to increase and become statistical the coin must be distributed. Only in combination with fair organizational rules, that care for a good distribution, transactions and thus valuation of the coin becomes decentralized.

Decentralized finance usually just refers to the public ledger, and the avoidance of a third person in transactions via smart contracts. It does not refer to a fair distribution. For fair distribution the participants in transactions must care for fairness. Fairness is an ethical value of humans, but often cannot unfold due to lack of information, centrally imposed to keep the advantage and power.

The distribution of information is the first step to fairness. The following crypto properties help towards fairness:

  • The ledger is public.
  • Smart contracts are programmed and can be reviewed before adoption.
  • Neither can be modified afterwards.
  • Smart contracts can be done without the need to trust a third party.

Extra fairness effort on top of the public ledger is still needed, though. The DAO needs its own purpose, its own constitution, local consensus rules, The data for a specific DAO needs to be made conveniently manageable for its members according to the DAO's constitution.

Bitcoin is a public ledger, but it is yet mostly used by rich people that have money to speculate on ups and downs of its exchange rate. The bitcoin capital is in the hands of a few and therefore not stable.

Everything develops by proposal and acceptance/adoption. So someone needs to (centrally) develop a proposition. If others accept the proposal a consensus has been reached.

A new crypto/blockchain/DAO needs someone to start it. If it gets adopted a consensus has been reached.

But people should also verify that the further governance is decentralized else their investment is laid into the hands of a few, which is not decentralized finance any more.

Source Code

bitcoin-core was the first and is now reference implementation to many forks. The forks, like bitcoin-cash-node, share much code with bitcoin-core and regularly take over changes from bitcoin-core.

Here some central identifiers. Initial v means vector, i.e. many:

CBlock(Header): vtx (nVersion hashPrevBlock hashMerkleRoot nTime nBits nNonce)
CTransaction: vin vout nVersion nLockTime hash
CTxIn: prevout scriptSig nSequence
CTxOut: nValue scriptPubKey
COutPoint: txid n
CChain: vChain of CBlockIndex
CScriptCheck: scriptPubKey amount ptxTo nIn nFlags cacheStore txdata pTxLimitSigChecks pBlockLimitSigChecks
CTxMemPool: mapTx
CConnMan: vNodes
CNode: hSocket, vRecvMsg

Note

hash

Hashes are used for

  • transactions (txid)
  • public key (Pay-to-PubKey Hash = P2PKH)
  • signatures (content according SigHashType + private key)
  • blocks (hashPrevBlock)
  • proof-of-work (POW): find a nonce that makes the block hash smaller than nBits

While POW's smaller-than task is hard, finding the data exactly hashing to a given hash is almost impossible. Hashing is a trapdoor.

Node

A bitcoin node is a bitcoind daemon running on a computer. Each node is its own time. Parallel times means parallel independent information.

To manage to maintain the consistency of many transactions, transactions are divided into blocks.

A mining node creates blocks (CBlock) that are filled with transactions (vtx) from the mempool of transactions (addTxs). The block is like a page in a ledger.

To make a common ledger, a common time, more mining nodes need to find a way to choose, who contributes the next block with transactions to the chain.

The first mining node that fulfills the proof-of-work, adds a block to to the longest chain. The frequency of blocks is controlled by the difficulty.

CBlockHeader::hashPrevBlock of each block fixes content of the previous block, because changing the content would produce a different hash that would not fit any more to hashPrevBlock of the next block. The hash brings the blocks into a sequence, a chain (vChain).

This ledger is replicated in all full nodes.

CBlock is derived from CBlockHeader and contains the transactions (vtx).

The hashPrevBlock that fulfills the nBits difficulty is based on data in the header (hashMerkleRoot, nTime, nNonce). The transactions are included in the hash indirectly via the hashMerkleRoot field.

The block chain is like its own time. The many different times of all the nodes create one common time.

The result of hashing is random. To find hashPrevBlock that meets the difficulty the hashes per second matter. Whether they are achieved in parallel or sequentially does not matter. This way many slow machines can be as fast as one fast machine. The fastest machine must not be more than 50% of the hash frequency of the whole network, else that fast machine could tamper with a block and then rebuild the chain and produce a longest chain, that would be accepted by the network.

Network

The network has a documented protocol.

Nodes in the network are characterized by permission flags like PF_MEMPOOL,...

The nodes exchange NetMsgType messages:

CConnMan::ThreadMessageHandler
    PeerLogicValidation::ProcessMessages
        ::ProcessMessage
            ::RelayTransaction
            ::ProcessGetData
            ::Process...
                CInv//ventory

A PeerLogicValidation implements the NetEventsInterface interface with SendMessages and ProcessMessages.

Only full mining nodes create new blocks. They need and others can fetch all accumulated unconfirmed transactions (NetMsgType::MEMPOOL/[GET]BLOCKTXN). Other nodes RelayTransaction one-by-one (NetMsgType::TX), so after some time all nodes will have all relevant transactions.

CInv types correspond to NetMsgType commands:

MSG_TX: NetMsgType::TX
MSG_BLOCK: NetMsgType::BLOCK
MSG_FILTERED_BLOCK: NetMsgType::MERKLEBLOCK
MSG_CMPCT_BLOCK: NetMsgType::CMPCTBLOCK
MSG_DOUBLESPENDPROOF: NetMsgType::DSPROOF

Each node constantly communicates with other nodes:

  • connman->PushMessage(pfrom, msgMaker.Make(NetMsgType::TX, ...)), ...
  • ProcessMessage according to the protocol, especially:
    • fetch new blocks and determine ChainActive (longest chain) (ActivateBestChain/FindMostWorkChain)
    • fetch new transactions as they need to be in the block before the block hash is created

ZeroMQ or zmq is an additional optional protocol to broadcast transactions and blocks.

Transactions

Each of the transactions vtx in a CBlock have

  • many inputs (vin)
  • many outputs (vout)

A transaction can

  • split the vin[i] to more vout[j], to take only part of a vout[n].nValue addressed by vin[i] and keep the rest via one's own change address, or it can
  • combine more vin[i] (previous vout[k].nValue) to one new vout[j].nValue.
  • or mix otherwise

vin is the n'th vout of another transaction (txid), referenced via prevout:COutPoint{txid;n}.

The unspent coin is important for validation.

cacheCoins:CCoinsMap is a map from vin[m].prevout to Coin{TxOut{nValue,scriptPubKey}} (CCoinsViewCache::FetchCoin()). This map is also stored in a leveldb .lvl database (CDBWrapper). The CBlockTreeDB is also stored in a leveldb database.

The Coin can be fetched from a CTxMemPool with mempool.get(txid).vout[n]. mempool holds enough transactions to check yet unstable blocks (COINBASE_MATURITY) against double spending.

Older transactions are secured in blocks by hashPrevBlock. Many blocks are serialized into one .blk file.

The sum of all vout[].nValue, i.e. GetValueOut(), minus the sum of all the vout[vin[].prevout], i.e. GetValueIn(), is fee.

Fee

The fee of a transaction is Σoutput - Σinput. The fees of all transactions mined into a block contribute to the coinbase, together with the subsidy. The fees are not linked to its original transaction via address keys. The coinbase has no input, but its output is subsidy+fee.

When mining a block the transactions are ordered high fee first. With more transaction available than fitting into a block those with higher fee are chosen, while the others wait for the next block.

There is a blockMinFeeRate(DEFAULT_BLOCK_MIN_TX_FEE_PER_KB) to accept to block and a GetMinFee() to accept a transaction into the transaction pool (g_mempool). The latter is influenced by the maxMemPoolSize configuration. The largest fee of the transaction falling out becomes the minimum of those allowed in. GetMinFee() gets exponentially smaller with a half life of 12 hours (or 6 or 3 depending on how fast the traffic goes down).

The users decide on the fees, but it is a guess, because if too low the transaction will not get into a block. A stuck transaction can be manually prioritisetransaction'ed, thus circumventing currently higher fees. But for that you need RPC access to a node.

The number of blocks in the network are kept at a constant rate (e.g. 1 / 10 min). With constant block size, even a larger network cannot serve more transactions. A larger network only produces more load for transactions.

In nature exponential behavior comes from independent times. The resource usage of a transaction can be considered constant (proportional to the number of network nodes). But those doing transactions are independent and thus produce an exponential memory usage. In the presence or constant memory, the fee will have an exponential behavior, shutting out an exponentially growing number of smaller fee transactions.

getmempoolinfo informs about the current GetMinFee().

GetMinFee() is a rate per KB. The actual fee is GetMinFee().GetFee(<transaction size in bytes>).

On Ethereum the fee required to make transaction go through is called gas. EIP-1559 burns a base fee. Miners only get the difference to the base fee. The base fee changes with the traffic. Burning the base fee means more is burned the more traffic. The supply becomes smaller, when the demand becomes higher. This increases the price of the coin (deflationary coin/token).

Script

Bitcoin has no fields for addresses one spends money to or from. The addresses are buried in a script indirectly addressing public keys as hashes. To redeem a vout[i]->vin[j] from one transaction to another, the following script composition must evaluate to true (done by CScriptCheck):

[ <vin[j].scriptSig> ]  [ <vout[i].scriptPubKey> ]

The first part comes from the later transaction's vin[j].

There are more variants, the most frequent one is P2PKH.

P2PK:

[ <signature> ]    [ <public key> OP_CHECKSIG ]

P2PKH:

[ <Signature> <Public Key> ] [ OP_DUP OP_HASH160 <public key hash> OP_EQUAL OP_CHECKSIG ]

P2SH allows to provide the public keys (or locks) in a script only when actually spending:

[ <only push data to stack> <script> ] [ OP_HASH160 <script hash> OP_EQUAL ]

e.g.:

[ <signature> {<pubkey> OP_CHECKSIG} ] [ OP_HASH160 <hash of {<pubkey> OP_CHECKSIG>}> OP_EQUAL ]

The hash prevents linking an UTXO to the public key and avoids that future more powerful computers can infer the private key from the public. Hashes are also smaller and thus easier to be communicated on paper or screen printout, either via binary-to-text encoding like base58 or a QR code.

ECDSA cryptography (secp256k1 for Bitcoin) allows to recover the public key from the private key. So only the private key needs to be saved.

The public key can also be recovered from a signature and the message/hash that was signed. This is actually how <signature> <public key> OP_CHECKSIG works. To redeem, OP_CHECKSIG needs to have access to the private key. How the hash for the signature is created is known by SigHashType. The last byte of the signature encodes sigHashType for SignatureHash(), VerifySignature(). SignatureHash in script/interpreter.cpp shows what is signed. sigHashType can decide that more of the transaction than just vout[i]->vin[j] is signed, normally sigHashType=SIGHASH_ALL, i.e. the whole transaction is signed in each vout[i]->vin[j] link.

Everybody can recreate the same hash using the same data in the same order, but only the owner of the private key can make a signature of the hash fitting to he public key it contains.

When redeeming, the signature can be published, so that everybody can verify that the token was redeemed righteously (scriptSig).

SignSignature can be used to fill vin[i].scriptSig, i.e. to redeem a transaction.

The sigHashType used in scriptSig does not depend on scripPubKey, i.e. OP_CHECKSIG will succeed if the public key fits to the signature, independent of the content that was signed.

Token

In general the hash of some data is called token. For example, in pay-to-public-key-hash (P2PKH), the public key is the essential part in scriptPubKey. It is thus an ownership token.

EIP-20 (ERC-20) is a specification of fungible tokens on the ethereum network. Coins are fungible tokens: They don't identify an asset. 200000 compatible tokens exist. They are all traded on the Ethereum network, and can thus be exchanged against each other. UNI from uniswap is such a ERC-20 token.

EIP-721 specifies non-fungible token (NFT, deed). The value of NFT's is in its links to physical assets or other non-copyable items like contracts (mortgages and the like).

It is interesting that NFT's are used for images and other things that have no link to real assets, but that consist of data only, and can be copied easily.

OpenSea is a marketplace for NFT's.

Wallet

Coins is an unspent output of transactions (UTXO, COutPoint). To use coins one needs to have

  • the private key fitting to the public key hash in scriptPubKey (for P2PKH)
  • the transaction hash (txid)
  • the index n into vout of txid

The public bitcoin site one can queried with a key hash, i.e. with an address, e.g.:

https://www.blockchain.com/btc/address/1EwpnNBdFJykwxp6X8v9AfZnup9bgmrLE1

Wallets can find transactions with importprivkey.

ScanForWalletTransactions allows to find the COutPoint{txid,n} for the private keys it contains. A wallet then stores the transaction hashes for its keys.

So what is important is only the keys. Only keys need backup.

For anonymity a new key is used for every transaction output.

With HD Wallets (HD = hierarchical deterministic), keys are generated from a seed and thus only the seed needs backup. With it the wallet can construct the keys and then query the blockchain.

Using the same HD wallet, the seed (key, phrase) can be used to regain access to all coins. The HD wallet name should be backed up, too, or the key derivation path.

Non-custodial software wallets:

Bitcoin: Bitcoin, Electrum, Pywallet, ...

Lightning: eclair, breez, muun, ...

ERC-20: bitbox, coinomi, metamask, zengo, brd, edge, trust bitpay (open source, visa functionality, segwit, schnorr)

Mining

Choose one time line (block chain) for more separate times (nodes).

  • Make adding a block hard enough by proof-of-work (POW) to last enough human-relevant time to accumulate transactions (10 min).
  • Make it easy to check the POW result.
  • A random POW algorithm (trial-error) makes two parallel similar nodes about twice as fast, because twice as many trials are done.
  • If none of the nodes is faster than the rest together it is impossible to overtake the longest chain.
  • A node adds a block to the longest chain (= chain with most work).
  • Longest chain with POW is the main consensus rule to choose the common time (ChainActive).
  • ActivateBestChain/FindMostWorkChain decides to switch ActiveChain.
  • Transactions (and its fees) count only in a matured ChainActive.

POW loop:

  • try a nonce until the block hash becomes smaller then a arith_uint256 bnTarget, constructed from nBits (difficulty).

    The arith_uint256 type is used to represent a block hash. SetCompact constructs a large arith_uint256 bnTarget number from a compact uint32_t nBits.

  • GetNextWorkRequired calculates nBits, CheckProofOfWork checks.

  • A node mines in response to generatetoaddress.

    CreateNewBlock() create a CBlockTemplate, on which one finds nNonce, then ProcessNewBlock()/ActivateBestChain()/ConnectTip()/ConnectBlock()

getblocktemplate is

  • an RPC API function
  • a protocol

getblocktemplate allows to do mining separately:

  • the miner asks the server about some fixed data (nVersion, hashPrevBlock, nTime, nBits) that needs to go into the block via getblocktempate.
  • The miner can change hashMerkleRoot, nTime, nNonce to produce a hash that meets nBits difficulty.
  • The miner calls submitblock on the server.

For a pure mining implementation in an ASIC, libblkmaker can be used to call getblocktemplate to a server. Then the miner can be simple, concentrating only on changing values to meet the difficulty (mining).

A bitcoin full node (server) can have more miners. This is called a mining pool. The full node is the server. It redistributes the reward to the miners.

Consensus

Apart from fulfilling the difficulty on longest chain, there are other relevant rules that decide, whether transactions and blocks are accepted by the network (MAX_MONEY, MedianTimePast(), ...). The coin is the result of the network consensus rules. The consensus rules decide, which transactions and blocks are accepted. The consensus rules are like a parallel program producing one time: the blockchain.

The nodes could have completely different implementations, if the behavior is the same. Two different implementations would need long testing against each other to produce the same behavior. The nodes are controlled by different parties, but they still choose the same implementation to produce the same behavior. The implementation of peers is not visible, though. If advantages are detected, individual nodes slightly change implementation and behavior here and there. The network adapts slowly by introducing new rules and checks them starting from a specific height or MedianTimePast() time. The upgrades have are named after BIPs or get special names, like taproot.

Changes in the behavior need to be taken over by all nodes simultaneously, or they are backward incompatible.

hard fork

If more nodes do not agree on the new rules, the transactions and/or blocks are mutually not accepted any more, which is a hard fork of the network and the into separate branches.

soft fork

In a soft fork changes need to have backward compatible behavior, to allow communication until almost all nodes are upgraded.

Fork above refers to chain forks. The software that creates a chain can also be forked. The software fork can possibly create a completely new chain with its own genesis block.

RPC Command

After starting, bitcoind exposes its interface as RPC. The RPC names and parameters are also command line arguments of bitoin-cli.

To list commands:

bitcoin-cli help

The simplest way to send money:

bitcoin-cli sendtoaddress [address] [amount]

Further information:

Wednesday, January 6, 2021

Function Concept: From Lattice To Computing

Summary

This is a continuation of the blogs

  • Information (2015)

    The basic concept of mathematics/physics is not the set, but the variable consisting of exclusive values (variable/value). Mathematics/physics is about information encoding/processing and the variable is the smallest entity containing/conveying information. A set consists of variables, normally bits ("there" or "not there").

  • Formal Concept Analysis FCA (2015)

    Concepts in FCA are a dual topology, once using extent (objects, locations), once using intent (attributes, values). There is a Galois connection between them. Intent and extent together form nodes arranged in a concept lattice.

  • Evolution (2019)

    In the evolution blog a system consists of subsystems (of variables). Energy is the rate of information processing (value selections): E = ΔI ⁄ Δt. The variable is the smallest entity having energy. Systems are layered. Every layer has its own energy unit. A subsystem has inner energy. All dynamical systems (natural evolution, economics, society, brain, computing, ...) can be described this way.

In this blog, the formation of structure to save information leads to functions and function applications (computing) according lambda calculus. When describing a concept lattice with functions (higher concepts), a function/variable is an uplink, a value a downlink (data, attribute).

Statistically a function can be described as multivariate probability distribution. The probability distribution describes how often the function occurs, i.e. how much information is saved by the separation of structure into a separate function. The probability distribution is the view (distance) from the function/variable to the usage locations. The dual view is that from a location to the functions, which is Bayes Theorem p(x)p(x|y) = p(y)p(y|x).

This blog also expounds using physical language, because every dynamic system is basically a computer. Physics has a language applicable to all dynamic systems. Dynamic systems produce functional structures just like software developers produce functions.

Note

high = concrete or abstract

Concept lattices traditionally have the abstract nodes further up. Uplinks are links to more abstract nodes and downlinks the opposite.

Normally though more concrete concepts are referred to as higher (e.g. OSI layers). The context hopefully makes it clear what is meant.

Computing

Concept Lattice

From a power set, variables arise via (co-)incidence produced by inputs (objects, location). In { {{1},{I walk}},{{2},{I run}} },

  • I is the invariant
  • that produces the variable consisting of the values {walk run}

I creates a local choice: the variable. I is by itself a value, but the change to the next value is slower.

Slower variables

  • channel the selections to local variables
  • are a context to local variables

run and walk exclude each other. In formal concept analysis (FCA) this exclusion is not yet there. A context in FCA is the incidence table, i.e. a binary there/not there, of

  • objects (extent), normally as rows, and
  • attributes (intent), normally as columns

One does a union of objects to intersect attributes of. This results is a lattice of concepts, where the binary variables of the incidence results in larger variables, i.e. where more values exclude each other.

The incidence table maps attributes to objects (A’ = O) and objects to attributes O’ = A. A’’ is called a closure.

A concept consists of

  • objects (extent) sharing the
  • same attributes (intent).

The concepts in the lattice have a partial order produced by containment.

More abstract concepts are

  • larger by extent (usage)
  • but smaller by intent (attributes)

than more concrete concepts.

The two orders are said to be dual. The extent marks importance and is used here.

In FCA more abstract concepts are drawn further up.

  • A node above figures as value (join)
  • A node below figure as location (meet)

This is against other fields, like the OSI layers in computer networking, where simpler concepts are drawn further down and said to be low level. The order people are used is that of space complexity, i.e. number of variables, not the number of usages (extent).

So here higher means more concrete, further up, and lower means more abstract, or further down. {{1 2},{I}} < {{1},{I walk}, because {I}⊂{I walk}. {{1},{I walk}} and {{2},{I run}} cannot be compared.

One can cut away the most abstract part (filter) or the most concrete part (ideal) and still have a lattice.

Every downlink is motivated by an uplink and vice versa.

  • One downlink meeting with more uplinks is a variable with attributes as values (uplinks are also called attributes). Different values produce different locations (more concrete objects).
  • Dually, one uplink joining with more downlinks is a variable with locations as values. This makes the locations the attributes and the attribute the location (understanding by location the focus of attention).

The concept lattice uncovers variables.

Figure 1: The variable maps values to locations.

Before starting to describe a concept, one must be able to distinguish values, like seeing color instead of just degrees of black and white, or seeing "color green" separate from "position here".

Excluding values are variables already in the real system. A processor can detect a variable via an exclusiveness in a common context, i.e. a common parent node in the concept lattice.

Exhaustiveness of a variable refers only to the available data.

In the FCA there is no information/freedom like there is no freedom in an image. The freedom arises when a thread follows the links between nodes. Every local context of the thread opens a variable (way to continue the path).

uplinks are AND-links
Further Down, more concrete concepts combine further up concepts, i.e. link upward.
downlinks are OR-links:
Further up, more abstract concepts link down to alternative more concrete locations where they occur.

The OR is exclusive by location and time. OR-links form a variable in the sense, that location represents the focus of a thread:

  • selection of a value/location by one thread represents one time
  • selection of a different value/location represents a different time

Function Lattice

One can express the concept lattice as lattice of sub-lattices, if one has

  • intersection of sub-lattices (abstraction)
  • union of sub-lattices (application)

Structure becomes a value, to be recognised as a location (value).

The common structure can be separated by introducing variables with values representing the change between locations (abstraction).

The locations where the same structure is used is re-created by application. In the application variables are united with the values. This turns out to overlap with lambda calculus.

N = (λx.M)V
  • M is abstraction of N
  • N is application of M

The variable is added to the left (right-associative). This way existing applications of N stay as is: NW=(λx.M)W

  • application is left-associative
  • abstraction is right-associative

A function is a structure that meets values of one or more variables to produce locations: each value combination one location.

A variable alone is a special case of a function: A variable is a function that maps its values to locations. A variable is a coordinate function. A function is coordinate system.

Dual view:

  • function maps a value to a new locations
  • value maps a function to a new locations

The function encodes the information of the (full) cycle of values, normally of several variables. A function that keeps only one argument variable is a variable, i.e. the other variables represent the function.

In programming the actual coding how to reach a location is done in function. The function can be called covariant, i.e. representing the complexity. The values are then the contravariant parts. In physics the unit is the covariant part, while the number value (magnitude) is the contravariant part, to reach a location.

A function application unites (=AND's) variables with values to form locations by

  • position, mapping ordered parts to ordered parts (matrix method)
  • name, mapping concept to concept containing the same name
  • pointing name (address)

In a computer with constant clock rate, the time for a selection depends on the structure that channels the clock's selections.

A variable is motivated by an invariant (called symmetry in physics), which hints to a slower variable, of which the invariant is one value. The invariant marks a fixation to a more or less temporary location, which focuses the clock (the energy) to the local values. A variable reduces clock cycles (energy consumption).

Functions are a way to organize selections.

  • Abstraction is compression. Less information needs less selection, i.e. less energy
  • Application is (re)creation (synthesis).

In the concept lattice, the number of variables increase downward. Every variable adds information.

If the information of a variable does not overlap with that of other variables, it is orthogonal.

n orthogonal variables, with v values each (logv information), create vn combinations (nlogv information). Such variables of same kind are a method of abstraction. Not all value combinations are actually used. They are channel variables that can accommodate all kind of information. Channel variables allow a general intermediate description with

  • encoding to it and
  • decoding from it

Selections in more concrete layers take longer, if details need to be considered. If details are of no relevance (encapsulated), then the selection rate can also be the base rate (c).

A description with more concrete concepts can be seen as domain specific language (DSL). A DSL can be embedded in a general purpose language via a library.

Channel variables can also be introduced in a concrete layer to create a multitude of value combinations of e.g. digits or letters, mappable to concrete concepts.

Such names are used in traditional programming languages (via numbers for indices and names for identifiers), in continuation of the language our social brain has developed during evolution.

The names can be translated to other types of links, including matrix operations in simulated neural networks.

The time to the next location is the link cost. It is a measure of distance of the location (represented by a value) to the variable or function. The time can be coded as probability.

The link cost depends on

  • kind (position, name, pointer)
  • parallel channels (parallel physically (wires, nerves), sequentially by bus)
  • sequential steps (path consisting of uplinks and downlinks)

A register machine basically uses a pointing name. The memory address is the name. The link cost (access time) can still vary: Some variables are in registers, some in cache, some in RAM.

Neural tissue is highly parallel. Simulation of neural networks is parallel to a certain extent because matrix computations are parallel via SIMD instructions.

FCA and NN

AI normally refers to information processing not completely controlled by humans. In traditional programs the freedom of self-adaptation lies in the value of predetermined variables and the use of pre-written functions. AI adds the capability to create autonomously its own concepts (subsystems). The level of AI is determined by how abstract and general the concepts become.

Both FCA and NN

  • have algorithms that create a lattice, which contains containments, which produces sub-orders and finally higher lattices that use functions.
  • have as input things that go together, the data.
  • allow to automate the ordering and reuse of information, to find a shorter description to meet the training goals (the environment).

FCA creates the lattice from below, on demand, starting with zero links, while NN creates the lattice from above, i.e. full connection, and reduces the connection via weight pruning after training with a lot of data.

In FCA you need as much input (extent) as much intent (features, variables) you want to store. In NN you need a lot of more data to prune all unneeded links.

In NN nodes are layered. NN starts with an array architecture. Given an array of channel variables as input, i.e. input with a lot of unused data, NN can be used to filter out the fetures of interest.

In FCA the lattice is organized by containment. This can be described by NN layers with clustered weights over more layers.

In an FCA lattice the nodes combine values with AND () links from above and with OR ( + ) links further downward, but not so much in layers as in NN, where layer i follows from layer k via xⁱ = wᵢₖf(xᵏ) + bᵢ. With the activation function f, neuron i becomes a binary variable. The weights wᵢₖ decide on the sources xᵏ, and the bias bᵢ decide on the coding of xᵢ to make FCA-like AND uplinks for the xᵢ node.

FCA does not provide fast algorithms with hardware support, but FCA can be subsumed by NN. FCA can guide the choice of NN architecture.

  • both have input that mixes values of more variables
  • both create a map of the actual topology from more inputs
  • both encode the topology with links and not by closeness of weights and nodes (neurons).
  • both need more inputs to produce the map; NN via parallel and gradual steps of change, FCA via (sequential) non-gradual steps.
  • both require the features/variables beforehand; NN to choose training input and NN topology, FCA to choose the kind of input.
  • both need more layers to combine more features/variables to functional blocks

The difference is how the links are usually created:

  • NN reduces links from dense. FCA builds links from zero.
  • NN adapts gradually using gradient of loss. FCA links are boolean: there or not, 1 or 0. Usually NN works with floats as weights for gradual change, but binary weights (as in FCA) are also possible in NN. (Hopfield network)
  • NN needs a loss function, which could be universal, though. FCA does without loss function.

Function as Multivariate Probability Distribution

General channel variables have general channel functions. This generalization reduces the dimensionality and allows a statistical treatment of variables.

Information makes sense only for a variable/function. Information is the number of the values/locations excluding each other in time.

As a value by itself has no information, the code length is that of the variable it belongs to.

Code length is the number of unit variables (normally bits) whose combinations of values produce the same number of values.

The function combines/encodes more locations/values. Depending on the encoding of the function the frequency of values will change. Still, the total code length for every value is that of the variable, i.e. I =  − Σpilog(pi).

Every occurrence gets the same energy by making the code for rare values longer:  − (logpi) ⁄ Δti = ΔI ⁄ Δt = E, where Δti = NΔt ⁄ Ni. The longer code for rare values can be seen as the distance of the locations of application from the function.

Probability represents a view from a function to the locations of application. At the locations of application generally more values of different variables are united with the function to produce the application.

The function's value combinations represent locations.

The function calls lead to a multivariate probability distribution by summing the locations/values along some other variables into a count representing the function's time. Fixing the values for some variables of a function (currying), the probabilities for the free variables is a cut through the total probability distribution.

In a concept lattice without memory limits there would be no need for a probability, because the address length would correspond to the code length resulting from the probability value. But with limited memory the probability is the function's view to the locations. The variable combination is the coordinate system of the function. A value combination (i.e. the application) allows to infer the location.

A multivariate probability distribution is a statistical view for a coded function. Probability theory derives the distribution from the coded function. Statistics derives the distribution from data. They are connected via Bayes Theorem p(x)p(x|y) = p(y)p(y|x).

The multivariate probability distribution represents one time and one particle because the value combinations are exclusive. If called by parallel threads the same function produces a separated probability distribution per thread.

The particle will be most likely where the probability is highest, but it will also occasionally be where the probability is lowest. The probability is the result of summing over hidden variables, basically all variables around the location of function application.

If all hidden variables were included, each value combination would be equally likely, because all frequencies would be the same. The frequency would be that of the processor that runs with a constant clock.

Without the hidden variables the frequency is that of the calls. This frequency is associated to the function, i.e. to the whole probability distribution, and not to single value combinations (locations). It does not matter where the particle is located: the energy (information/time) is always the same or made the same by entropy encoding.

Equal probabilities corresponds to a good choice of function or a balanced coding (like balanced tree), i.e. a good choice of coordinate system. Equal probabilities is information maximization (principle of maximum entropy). Maximum entropy corresponds to well distributed energy.

Language

Language

A processor needs a way to address its concepts. There are several ways to address concepts. Addresses are concepts themselves.

The animal brain has neurons and synapses as low level language. This network is connected with the world through senses and it is enough to intelligently interact with it. Still, the human animal has further developed a more concrete language on top of it to better work together.

The human language hierarchy is

  • byte-phoneme-glyph
  • names
  • addresses
  • concepts
  • ...

Names are the smallest part of an address. A name selects a value from an internal variable. A number can be a name.

The concept lattice needs a language to exist. A description of structure with a language is the concept lattice.

The same concepts can be expressed using different languages

  • bus addresses in a register machine
  • synaptic paths in the brain
  • weights in a neural network (NN)

One needs conversion to and from the internal language, to allow to transfer a system from one processor to another.

A small low level, abstract vocabulary can be used to build higher level, more concrete, concepts.

Concept libraries are identifiable and are negotiated to settle on a common vocabulary for communication.

Basic language

It took natural evolution several hundred millions of years to reach our level of intelligence. The brains had to develop along. It is also a question of hardware.

Our proofed abstract language from mathematics and the principal understanding of what learning is (basically information compression) will show us a shortcut.

Humans have developed an abstract language already. Humans can divide-and-conquer vertically and train modules to use their abstract language to describe more concrete things.

With programming languages the programmer still needs to think of how to write the functions. The experience and abstractions developed over generations provides developers with abstract concept allowing them to describe all kind of system.

Software modules are trained modules. The testing was their training. Pre-trained FCA or NN represent also such a module, a high level concept.

More FCAs can be combined with AND and OR like any other values. Higher level concepts form a higher level language. To really understand and merge the concepts and possibly form other concepts that lead to a shorter description, the high level concepts need to be described with a common low level language. Then they can be compared and merged.

It takes quite an effort to realize that two mathematical theories are equivalent, e.g. Curry-Howard correspondence. One needs to find a common way to describe them, a common language. This is why mathematics develops a more and more abstract base language. A common base language avoids that it happens too often, that people spend their life developing a theory to realize it was there already.

To do a similar job, an AI also needs to have the high level concepts in a common low level language. It is not only AND and OR, but also which value out of which variable, and how they are encoded into bits.

For example to allow an FCA to reorganize functions of a program, it needs to have a common description of them, e.g. via their machine code.

Turing-complete

A dynamic system needs information and time. A computer needs memory and clock. The more clocks, the more subsystems.

Where is the clock in the Turing machine? It is the function, which can consist of sub-functions. A Turing machine has energy (information/time). A function has energy (information/time).

To define the function as a map from all domain values to all codomain values in one time step is never reality. It is an abstraction of a subsystem with the time unit equal to the cycle time. When comparing more subsystems a common time needs to be used, which brings energy into play.

A Turing-complete language needs to map to creation and reduction of subsystems i.e. mutation and selection. This way information flows. For actual creation and selection time is needed: a thread.

The minimal SKI or rather SK is Turing-complete. SK corresponds to boolean AND (creation) and OR (selection). iota (ι) is another minimal Turing-complete language.

What a Turing-complete language can actually do depends on the amount of memory. How fast it can do it depends on the system's clock.

Information and Energy

Variable

Mathematics is about information processing. Its foundation must hold information.

A variable is a set of values that are

  • exclusive (one value at a time)
  • exhaustive (all values get their turn)

A bit is the smallest possible variable.

The variable is the foundation of mathematics. A set in the conventional sense can be a variable, if finite and an exclusive choice is added ( ∈ ).

A set where intersection and union is possible is not a variable, it is rather a collection of parallel bit variables. The power set of all combinations is a variable (with 2^N values), if a combination of values is seen as its value.

The information of a variable is the number of bits to produce the same number of values.

I = log₂N

A variable has information. A value has no information.

For a variable to persist in time the values must be selected in a cycle. When values are reselected the cycle is repeated. The cycle of selections of values is the variable. The cycle information is the variable information.

All objects moving in a physical space were observed to cycle at some scale.

Infinity

Infinite/non-cycling variables do not exist other than as a counting cycle/loop with deferred stop in an information processor.

An information processor is a dynamical system. All dynamic systems consist of cycles. The human mind is an example.

Mind or processor shall mean a general information processor, including computers.

A counter normally uses a hierarchical containment of loops producing different values that are combinations of values of lower variables, whose rate of change differs in a systematic way e.g. by position of e.g. digits, letters, phonemes, ...

A counter mimics a general dynamic system with subsystems. A counter with a deferred stop is also a deferred amount of information processed.

One can nest counters. is a nesting of two counters: size and precision. A 2 ∈ ℝ has an infinite counter on the precision axis the same way as every irrational number. A value of is an algorithm, a counter, a higher concept. A value of , or is not algorithmic.

IEEE754 fixes the two stop conditions by fixing the information in fraction and exponent (precision and size) for hardware. In arbitrary precision software one is more flexible: one can defer fixing the stop conditions to the point where actually used.

Probability

Probability counts time (times of occurrences).

N ~ 1 ⁄ Δt ~ 1 ⁄ p

The normalization of probability to 1 for one variable is a comparison of time units.

Information is associated with the variable not the value. With just one variable type of C equally frequent values, its information is 1, just like it would be for the bit, but the unit is different, with the factor log₂C as unit conversion.

If values have their own time and if it is squashed to the time unit of the variable, then the (average) information or entropy of the variable is

I =  − Σpilogpi

Note that this has included

  • time via pi and
  • space via logpi (information)

This is information per time, which is energy. But the time unit is that of the variable. A variable with same number of values and same distribution, but high frequency cannot be distinguished from one with low frequency. Locally this is also not needed.

Two variables with independent times have probability p₁₂ = pp. This corresponds to a transition to the smaller of the two time units. The information becomes additive, which makes the energy additive.

The Kullback-Leibler-divergence compares two probabilities on the same variable, one derived from data, one from theory (as seen from the function). The Kullback-Leibler-divergence is the difference in information (code length) between theory and observation.

Note

information = entropy

Information describes both,

  • what can be known (the alternatives, entropy, the variable) and
  • and what is known (the value).

A system that does not change or has no alternatives has no information. A value alone has no information. The alternatives are the information.

Energy = Information / Time

Information alone entails time, because the values (selections) need time. Without values no variable and thus no information.

E = I

A time step is a selection (value). Time does not exist between selections in the absence of another selection to provide a clock. When there is another selection to compare to, one gets a unit to compare to.

Energy is the comparison of information with another information. Time is the unit of information. Energy is information expressed in the unit of time.

With fixed information step h time and energy are inversely proportional:

ΔE = h ⁄ Δt

This is like with any physical quantity: unit and number value are inversely proportional.

Energy is the differential view on information, and information is the integral view on energy.

E = (dI)/(dt)

One always compares information with information. Time is a variable and thus information, too.

Layers

Interaction have their own time on a higher layer. In the power P = dE ⁄ dτ = kd²I ⁄ dt², the τ is the time of the higher layer and E and t are the inner energy and time of the subsystem. P by itself is also an energy, but on the next higher layer. When using the same time unit, then every layer adds a power of time rate

  • one layer (variable): E ~ ν
  • two layer K ~ v2
  • three layers B ~ ν3 (Planck law)

With according parallel independent processes, the expressions shift to the exponent.

In thermodynamics, temperature T is a unit of energy

ΔE = TΔS = (E)/(S)ΔS

i.e. one splits the information into two layers, but keeps one time (the motion of particles gives the base clock for thermodynamic processes).

Temperature more generally is the energy (information flow) between subsystems. The subsystems change because of the gain or loss of information.

If the system as a whole looses energy certain structures settle in and stay for a longer time. This structural cooling reduces the dimension (number of variables), which reduces the information and frees it to the surrounding (e.g. exergonic reaction).

One Time - More Variables

The variable implies time, but

  • more variables in the observer system
  • can be one variable with one time in the observed system

A coordinate system in mind might split a variable into more, which in reality are simultaneous. In mind the variables can be processed separately, but if a description of reality is aimed at, it needs to consider that the real variable consists of value combination.

A general transformation between systems can be described by the Jacobian J₂₁, where the combination of variables of system 1 and 2 form a 2D matrix. One can expand the variables to values and work with a 3D matrix, but the other way around introduces functions that code structure. The matrix elements are impulses, which, if zero, describe an invariant.

Functions can be non-linear, but non-linearity can also be described by more linearly coupled systems:

n = Jn(n − 1)...J32J211

A Jji corresponds to a layer of a Neural Network (NN).

Time comes into play when describing all the system variables' rates relative to a third one's rate. The third variable is arbitrary and the mapping to the independent clocking of selections of system variables is necessarily imprecise.

The Δt of the observer clock is external, but assumed constant, while the information is inherent to the system. Energy conservation for a closed system says that the information of the system is conserved. A closed system is only locally closed, though. Non-closed cycles loose or gain information per time, i.e. energy.

An invariant binds variables and makes their values to value combinations.

A mind normally has an internal clock. What is invariant to its internal clock matters when describing the external system's information with the internal. What is invariant to the internal clock is one value combination. The according variables become dependent.

Formation of dependent variables is a reduction of dimensions.

The relation of the values in the value combinations can be described with functions.

A function is a reused subsystem (invariant sub-concept-lattice) to create dependence between variables. A variable itself is a special function. The variable is the smallest fixation: just one invariant that all values share. A value is what is different at a location of function application.

A subsystem of dependent variables with one time is called particle in physics and thread in computing.

The order of selection produces a distance. Since a variable needs to cycle, the first value needs to follow the last one. One variable (dimension 1) cannot create a cycle. Two variables form a minimal particle or thread.

Interaction between (processing of) particles form a higher layer, and need additional variables there. Three dimensions are minimal to have separate times (parallel processing). The actual number of dimensions is very dependent on the system.

Independent particles have independent

  • information I
  • time t
  • energy E

at every layer.

In a layered system, containment channels energy of a thread/particle

  • to spread into lower particles (log) or
  • to accumulate from lower particles (exp)

Channel Variable

Flexible variables for general usage are called channel variables in this blog.

Examples of channel variables are the pixels of a screen or the receptors on the retina.

The information capacity per time of the channels needs to be higher than the actual information sent per time. The quotient ΔI ⁄ Δt matters: ΔIs ⁄ Δts > ΔIc ⁄ Δtc. Information per time is energy E. One can say the "energy of the channel" instead of channel capacity, which is the maximum of mutual information, i.e. the maximum entropy by which the input and output probabilities are still dependent on each other.

Non-binary variables with C values need logC binary channels. By doubling the frequency of a channel compared to e.g. the processor, the number of bus lines can be halved without energy loss. Else reducing lines reduces energy.

  • Our senses are channel variables.
  • The phonetic multitude of a natural language are a channel.
  • Data types are channels.
  • Numbers in mathematics are a flexible arbitrary width channel.

Higher dynamic systems have channels to exchange subsystems that encapsulate energy.

Some life on earth has evolved brains with algorithm that can decode from sensory input channels to the actual variables of origin.

The mind is a dynamic system, that controls more energy than it consumes, by simulating higher energy interactions with lower energy interactions. Such Maxwell demons are ubiquitous, but do not work any more if both systems use the same energy encapsulations.

Evolution is search, i.e. trial and error or mutation and selection. Structural evolution needs to invent and prune structures, i.e. concepts, to reach a description short in space and/or time, to reduce the information per time, i.e. the energy.

By exchanging more concrete concepts one needs less time than by using low concepts, because the concepts are there already in each interlocutor, they just need to be selected.

This is also the case in the physical world. A kilo of petrol contains more energy than a kilo of current technology batteries, but the exchange in both cases takes the same amount of energy.

The channel capacity depends on what is sent, on the protocol, and thus on sender and receiver. Channel capacity can be increased by

  • compression-decompression
  • memory and recall of memory
  • high level concepts

Given a fixed energy low level channel like the phonemes of humans, there are still the protocol levels above, like the many-layered human concepts. On a computer, transporting HTML needs less channel energy than a pixel description of the page.

A brain or computer has a more or less constant energy (= processing = communication of information per time). Using abstract concepts takes more time. Someone who tries to understand, i.e. compare the abstract language, is slower. In that time more physical energy is consumed, because what flows on the lowest physical level is physical information.

More brain energy consumption in humans was made possible, because by broadening the sources, by becoming a generalist, also more energy became available. A species is a channel by itself. Ecosystems are channel systems that structurally evolve over time.

The complexity of a language and the complexity of world it describes are covariant. Humans evolved intelligence because they were living in a complex world already.

Physical Function

Physically a function is a invariant structure (invariant meaning with a lower rate of change). This structure stays the same, while some values change. The structure channels information into local cycles. Local cycles need less information with equal clock.

The structure leads to the selection of the next location with new impulse and new value. A function can be seen as the impulse itself. Processing (impulse) and value are alternating.

The phase space volume of the particle/thread is the information summing both independent changes: that of the impulses (functions) and that of the value(s). Basically phase space volume just counts the locations traversed by the thread. Every location is a function application, a time step. So the phase space volume is equal to the time, but with a separate time unit one gets:

E = I ⁄ T

The energy of a thread/particle is the phase space volume I per cycle time.

This corresponds to the average information per average time step ΔI ⁄ Δt of a value selection.

In time stretches smaller than the cycle the actual selection time of a value and its local/space extend are inversely proportional (contravariant) to keep the constant energy of the particle. The function can be seen as curvilinear coordinate system.

Time is defined by the sequential processing of a function. Exclusiveness of values refers to a location of application, representing a time step.

In physics, specifying a functional relation, corresponds to the transition from the Lagrangian to the Hamiltonian, where the former assumes independent particles (E + E = E − V) and the latter one particle, i.e. one time (E + V = H = Constant).

Function Time Complexity

A function maps input to output. It works as a channel. One can associate an energy to it

E = h ⁄ Δt = hν
  • ν is (processing) rate (speed)
  • h is (processed) information (memory)

Both can be predetermined (register width, fix clock rate), or a result of runtime adaptations (parallelization / number of synapses / NN links, dynamic clocking), but normally they stay constant over some more or less local time.

A processor has basic variables and functions with their according E. The execution time of a higher function is the sum of that of lower functions.

As a result in the containment hierarchy of functions every function has got its processing energy, which depends on the (underlying) structure and the lowest level processing energy.

The processing energy corresponds to the big-O-notation of time complexity. In O(f(n))

  • f(n) is number of time steps depending on the size of the input. The reflects the layering, i.e. the structure, of the processing.
  • ν = 1 ⁄ Δt is replaced by the big O, because it depends on the varying system clock and access strategies of systems
E = O

A processor can process limited, normally constant, information per time. Processing energy is shared between parallel threads. Threads can be halted or their atomic operations can be of variable duration. Parallel threads have separate times.

Evolution of a processor (HW+SW) and its functions goes towards

  • a higher E on the supply side (computers/processors/functions become more powerful)

    For functions:

    • cache arguments (currying): - h up due to caching - ν up due because no need to supply argument
    • memoization:
      • h up due to the map
      • ν up due to shortcutting input to output via direct mapping
  • a lower E on the demand side (information is compressed to save computing power)

    For functions: generate arguments on demand.

    • h down since no caching
    • ν down due to needed calculations to supply arguments

In a function with constant E, h and ν are inversely proportional (contravariant).