Idea Bank 2020: Steganography

Jed
May 17, 2021
10 min read

Updated: May 23, 2021

Steganography is an important pillar of coding that stands for STEGosaurus ANOmalistic GRAm PHYsics. It talks about applying the physics of dinosaurs to prevent coding from going extinct.

Just kidding! Steganography originates from the Greek word steganographia, which combines the words steganós, meaning "covered or concealed", and graphia meaning "writing".

Steganography is the technique of hiding any type of secret data within an ordinary file or message in order to avoid detection. The secret data is then extracted at its destination.

While steganography may sound like a very abstract concept, many of us have in fact practiced it before!

If you remember doing some sort of experiment using invisible lemon ink to hide a secret message, you have some experience with steganography!

Section 1: Text Steganography

Begin in G-Code. Bit rates are in neutrons.
Write down the first letter of each word in step one
Enjoy!

Secret Message: B I G B R A I N

Section 2: Image Steganography

On each sheet of tracing paper, draw a grid of 5 squares by 5 squares.
On each sheet, shade in the squares as shown in the diagrams below
If a square is shaded in both grids, erase them from both grids.
Put the 2 grids on top of each other, such that they overlap. What do you see?

RESULTING GRID (SECRET MESSAGE) : hi

Section 3: Concepts and Keywords

Text Steganography
Image Steganography
Exclusive Or (XOR) Operation

What you have witnessed is secret messages being hidden in larger sets of data. The art of concealing these messages is known as steganography.

Using steganography, messages can be hidden in various forms of media, such as text, images, and even audio!

The first experiment on text steganography shows how simple messages can be hidden in plain text. It can be challenging to do, but a well-crafted message won’t reveal that there is a hidden message inside.

The second experiment on image steganography is an example of how 2 sets of seemingly nonsensical data can be pieced together to show the final message.

In this instance, we performed an exclusive or (XOR) operation on the 2 grids, where the square in the final result is only true (white) when the corresponding squares in both grids have the same color.

In the real world, messages are hidden in digital color images and audio recordings by modifying the bits ('0's & '1's) in the data. By modifying the least significant bits, the original color or sound changes, but not enough for human eyes and ears to notice.

The result will be an image file that appears identical to the original image but with some "noise" anomalies.

In the world of hidden messages, another common technique applied on secret messages is encryption. The difference between encryption and steganography is that encryption scrambles the messages such that no one can understand it without the password (or key), while steganography conceals the data so no one knows that there is a hidden message to uncover.

Section 4: Application to Daily Life

Steganography can be combined with encryption as an extra step for hiding or protecting data. The content to be concealed through steganography (hidden text) is often encrypted before being incorporated into the innocuous-seeming cover text file, image, or even audio file.

The primary advantage of using steganography over encryption is that it helps obscure the fact that there is sensitive data hidden in the file or other content carrying the hidden text. In contrast, an encrypted file or message is clearly marked and identifiable because it simply doesn't make sense!

Steganography Techniques & Software

The practice of adding a watermark is one common use of steganography. Watermarking is often used by online publishers to identify the source of copyrighted media files that have been found being shared without permission.

Steganography software is used to perform a variety of functions, including encrypting the data in order to prepare it to be hidden inside another file, keeping track of which bits of the cover text file contain hidden data, and extracting hidden data by its intended recipient.

Some online and open-source steganography tools include:

OpenStego
Xiao Steganography: used to hide secret files in BMP images or WAV files
Image Steganography: a Javascript tool that hides images inside other image files
Crypture: a command line tool used to perform steganography.

The Science of Encryption

Most people understands what it means to encrypt data. After all, we see the phrase "end-to-end encryption" every time we send a message to a new contact on WhatsApp. But how does encryption truly work in practice?

To illustrate both the manual and computerized processes of encryption, I've decided to reference the Week 2 Problem Set from Harvard's introductory computer science course, CS50.

Before we begin, there are three main terms that we need to understand and remember:

Encryption: we conceal text in a reversible way
Unencrypted text is called plaintext
Encrypted text is called ciphertext.
The secret used to encrypt/decipher the text is called a key.

Method 1: Caesar

The Caesar cipher is named after Julius Caesar, who apparently used it with a shift of three (A becoming D when encrypting, and D becoming A when decrypting) to protect messages of military significance. A simple example is shown below:

(Unencrypted) Plaintext:  THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
(Encrypted)   Ciphertext: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

According to Wikipedia, It is unknown how effective the Caesar cipher was at the time, but it is likely to have been reasonably secure. After all, most of Caesar's enemies were illiterate and would have simply assumed that the messages were written in an unknown foreign language.

It's also worth remembering though, that Caesar lived more than 2000 years ago, and the technology of the Caesar cipher is clearly outdated. If we use the same key every time, everyone would be able to easily figure out the key by brute force, and so there's no point even encrypting the messages.

In the Caesar problem set, our task is to make use of Caesar's fundamental concept of shifting the plaintext by a certain number of positions to develop a simple encryption algorithm.

More formally, if p is some plaintext, pi is the ith character in p, and k is a secret key (i.e., a non-negative integer), then each letter, ci, in the ciphertext, c, is computed as: ci = (pi + k) % 26

In doing so, we reduce the predictability of our encrypted message and make it much harder for sinister fellows to snoop on our private conversations. The walkthrough video below explains the Caesar cipher clearly with detailed examples.

For those interested in encryption, the code in C for the Caesar cipher is appended below with ample comments to highlight the entire process.

I've added the code even though it's more advanced than what this post intends to cover so that we can easily reference it later when we start work on our coding & electronics modules.

If you glance through the code, you'll immediately notice that encryption is actually far more complicated than we might think. Unlike humans, all computers can only read the numbers '0' and '1'. This is known as binary code, and we'll touch on this idea of binary in future blog posts.

Essentially, what happens in the Caesar algorithm is that we have to convert plaintext into a series of '0's and '1's that the computer can understand, rotate it by the secret key specified by the user, before converting the new set of '0's and '1's into an output of ciphertext that appears as alphabets.

// Tasks for Caesar: get key; get plaintext; encrypt; print ciphertext

int main(int argc, string argv[])
{
    // Step 1: Counting Command-Line Arguments
    if (argc != 2)
    {
        printf("Usage: ./caesar key\n");
        return 1;
    }

    string key = argv[1];
    for (int i = 0, len = strlen(key); i < len; i++)
    {
        if (isalpha(key[i]))
        {
            printf("Usage: ./caesar key\n");
            return 1;
        }
    }
    printf("%s\n", key);

    // Step 2: Convert Key from type str to int using atoi
    int k = atoi(key);

    //Step 3: User inputs plaintext
    string p = get_string("plaintext: ");

    //Step 4: Encrypt plaintext & then Print ciphertext
    printf("ciphertext: ");
    for (int i = 0, n = strlen(p); i < n; i++)
    {
        // if char is alphabetic, shift character by key & preserve case
        // A | Convert ASCII to alphabetical index; where a = 0
        // B | Shift alphabetical index using formula: (p[i] + k) % 26
        // C | Convert result back to ASCII

        if (isalpha(p[i]))
        {
            if (isupper(p[i]))
            {
                char x = p[i] - 65;
                char y = (x + k) % 26;
                char z = y + 65;
                printf("%c", z);
            }

            else if (islower(p[i]))
            {
                char x = p[i] - 97;
                char y = (x + k) % 26;
                char z = y + 97;
                printf("%c", z);
            }
        }

        // if char is not alphabetic, print character as it is
        else
        {
            printf("%c", p[i]);
        }
    }
    //exit program by returning 0 from main
    return 0;
}

Method 2: Substitution

Even though our Caesar algorithm allows for greater variability in specifying the secret key, anyone with enough patience and intuition can still use a brute-force method to decipher the ciphertext back into its original plaintext.

A better way of encrypting our plaintext would thus be to use the random substitution method.

Imagine directly translating this post from English to Chinese using a traditional translation dictionary rather than Google Translate. Each word in English would have a corresponding word in Chinese, and anyone who reads this blog post in Chinese can reference the translation dictionary to change it back into English.

We can apply a similar concept to substitution, just that we are now translating one alphabet into another. For example:

Original Alphabet:ABCDEFGHIJKLMNOPQRSTUVWXYZ
     New Alphabet:JTREKYAVOGDXPSNCUIZLFBMWHQ

So when we want to input our original message of "HELLO", the substitution algorithm will encrypt it by replacing each original letter with its corresponding letter in the new alphabet and output the ciphertext.

plaintext:  HELLO
ciphertext: VKXXN

Essentially, in the substitution problem set, we “encrypt” a message by replacing every letter with another letter. To do so, we use a key: in this case, a mapping of each of the letters of the alphabet to the letter it should correspond to when we encrypt it. To “decrypt” the message, the receiver of the message would need to know the key, so that they can reverse the process: translating the ciphertext back into plaintext.

How does WhatsApp encrypt your messages so thoroughly that even WhatsApp can’t read them?

There are many situations when you’d want your communications to be encrypted so that eavesdroppers couldn’t figure out your sensitive info. For this, the web uses a technology called HTTPS, which automatically encrypts all information sent between your computer and a website’s servers.

When you take your eyes off this blog post and look at the address bar of your browser, you'll definitely notice a small lock icon, which tells you that you are using the secure HTTPS information transfer protocol. We promise that we are not trying to steal your personal data.

Here’s the catch: the server can still decrypt your info and read it. Sometimes this is necessary: Amazon and Lazada can’t charge your credit card unless it can decrypt and read your credit card number.

But sometimes companies use decrypted personal data in ways that upset customers. For instance, Google used to read your Gmail to target ads at you.

So WhatsApp was widely praised when in 2014, it launched end-to-end encryption (E2E), which only lets you and your recipient decrypt your messages. Neither WhatsApp, nor its parent company, Facebook, can figure out what you’re saying!

To explain end-to-end encryption, let’s use an analogy. Suppose that SingPost is evil and tries to open any package that’s sent through the mail. Singaporeans are understandably upset, but they have to use the postal service if they want to send physical gifts over long distances.

So citizens devise a clever way to ensure that no one besides the intended recipient of a package, not even SingPost, can open the package.

Each person creates a key and hundreds of locks that can only be opened using that key. Everyone keeps their key in a secure place in their house, but they distribute their locks to supermarkets like NTUC Fairprice around the country.

Say you want to send a box to your friend Einstein. You grab one of Einstein's locks from your neighborhood supermarket and attach it to your box.

When you mail the box to here, SingPost intercepts it. Of course, without the key, they can’t open it! But when Einstein gets the box, he can open it because she has the sole key that can open her lock.

This system is secure, since only the intended recipient can open the box. It’s also clever because anyone can send stuff to anyone else without having to coordinate ahead of time. You just need to grab someone’s lock from the store whenever you want to send them something.

This is also how end-to-end encryption works, and the formal name for this method is “public key cryptography.” With this method, every user is given a public key (which in our example, is a lock) and a private key (which in our example, is literally a private key).

Every message is encrypted using the recipient’s public key, and can only be decrypted using their private key and some math as seen in the diagram below:

Section 5: Ethics

Key Tensions: Privacy vs Security | Individual vs Collective

As with any new technology from facial recognition to blockchain, the rapidly growing use of end-to-end encryption has spurred endless debates about the ethics of the technology.

On one end of the spectrum, oppressed individuals and communities have found encryption incredibly useful. Journalists are particularly interested in end-to-end encryption, since they need more secure ways to communicate with sources given the rise in political censorship. Political dissidents in repressive regimes like Syria have also started using Telegram and WhatsApp to get around government snooping.

But the individual privacy that comes with encryption technology puts our collective security at a greater risk. ISIS was renowned for using Telegram to evade intelligence agencies, and we have already seen how devastating this can be through the use of encrypted messaging apps to plan the 2015 Paris attacks.

As consumers, we certainly value our privacy greatly even if some of our Instagram profiles may be public. But as citizens, we also recognize the need for preserving the greater good of security and national peace.

Another recent example of this fragile trade-off is the controversy over Apple's adoption of end-to-end encryption for iPhone backups.

A few years ago, Apple apparently planned to offer users end-to-end encryption when storing their phone data on iCloud in order to thwart hackers.

But this arrangement also means that Apple itself would no longer have a key to unlock the encrypted data, meaning it would not be able to aid intelligence agencies and law enforcement officials in decrypting material even under court order.

After strong objection from FBI representatives, Apple eventually decided to drop its development of end-to-end encryption in order to avoid the legal ramifications of the technology.

If you're still undecided about whether Apple made the right choice, you can read more about the ethics of encryption here and here!

Like encryption, steganography also faces its fair share of controversy over its ethical use. While there are many legitimate uses for steganography, malware developers have also been found to use steganography to obscure the transmission of malicious code ...

Key References for this Blog Post: CS50 and Swipe to Unlock

Idea Bank 2020: Steganography

Section 1: Text Steganography

Section 2: Image Steganography

RESULTING GRID (SECRET MESSAGE) : hi

Section 3: Concepts and Keywords

Section 4: Application to Daily Life

Steganography Techniques & Software

The Science of Encryption

Method 1: Caesar

Method 2: Substitution

Section 5: Ethics

Recent Posts

Comments

Stay Connected

Program Locations: wherever you are in Singapore
HQ: 1 Raffles Institution Lane, Raffles Institution, Singapore

Section 1: Text Steganography

Section 2: Image Steganography

RESULTING GRID (SECRET MESSAGE) : hi

Section 3: Concepts and Keywords

Section 4: Application to Daily Life

Steganography Techniques & Software

The Science of Encryption

Method 1: Caesar

Method 2: Substitution

Section 5: Ethics

Comments

Stay Connected

Program Locations: wherever you are in Singapore HQ: 1 Raffles Institution Lane, Raffles Institution, Singapore

Program Locations: wherever you are in Singapore
HQ: 1 Raffles Institution Lane, Raffles Institution, Singapore