WEBVTT 1 00:00:03.166 --> 00:00:05.733 Hello, I’m Kuina-chan. 2 00:00:05.733 --> 00:00:12.733 This time, let’s make a music composition software from scratch, and compose a song with it! 3 00:00:12.733 --> 00:00:18.300 Actually, there is already a completed song, so let’s listen to it first. 4 00:00:28.166 --> 00:00:32.000 Right, it sounds like music that plays when you defeat an enemy in a game. 5 00:00:32.000 --> 00:00:35.299 I played around with the chord progression a bit. 6 00:00:35.299 --> 00:00:38.433 So, let’s make this right now. 7 00:00:42.333 --> 00:00:44.466 For now, I’ll log in, 8 00:00:49.799 --> 00:00:55.733 and in the Documents folder, I’ve created a working folder named “MusicMaker”. 9 00:00:55.733 --> 00:00:59.766 This time I’m using Debian, a Linux distribution, 10 00:00:59.766 --> 00:01:02.000 but any OS is fine. 11 00:01:02.000 --> 00:01:06.733 Feel free to use whatever you like, whether it’s FreeBSD or Windows. 12 00:01:06.733 --> 00:01:09.433 Well then, for starters I’ll open the terminal. 13 00:01:09.433 --> 00:01:13.599 I’ll create a file named main.c, and open it with Vim. 14 00:01:13.599 --> 00:01:16.733 You don’t have to use Vim as your text editor either. 15 00:01:16.733 --> 00:01:19.599 I usually use VSCode, 16 00:01:19.599 --> 00:01:22.766 but it’s a bit too convenient and loses the “making it from scratch” vibe, 17 00:01:22.766 --> 00:01:24.733 so I’ll write it in Vim this time. 18 00:01:24.733 --> 00:01:29.933 For the programming language, using Python or similar is convenient due to rich libraries, 19 00:01:29.933 --> 00:01:33.000 but that hides what’s actually going on inside, 20 00:01:33.000 --> 00:01:36.733 so I’ll intentionally write it in simple C language this time. 21 00:01:36.733 --> 00:01:37.766 I’ll show you everything. 22 00:01:38.733 --> 00:01:43.299 First of all, I’ll #include everything I can think of, and write the main function. 23 00:01:43.299 --> 00:01:45.599 Now, as for the general flow, 24 00:01:45.599 --> 00:01:50.200 the goal is to generate a WAV file with this program. 25 00:01:50.200 --> 00:01:52.733 A WAV file is a type of audio file, 26 00:01:52.733 --> 00:01:56.733 and it’s a format where the waveform is contained as-is. 27 00:01:56.733 --> 00:02:00.733 In other words, it is a format that is easy for humans to read and write. 28 00:02:00.733 --> 00:02:05.733 So for now, I’ll create a buffer to store the binary of this WAV file. 29 00:02:05.733 --> 00:02:10.233 I don’t know the size yet, so let’s set it to 1024. 30 00:02:10.933 --> 00:02:12.766 (I will speed up the process as appropriate) 31 00:02:13.400 --> 00:02:17.500 Assuming the content will be written by a WriteBin function, 32 00:02:17.500 --> 00:02:22.366 I’ll output the resulting binary to a file named output.wav. 33 00:02:23.133 --> 00:02:26.733 (Programming is fun...) 34 00:02:28.633 --> 00:02:33.066 And then, once the content is written using WriteBin, it’s complete. 35 00:02:36.099 --> 00:02:41.733 Let’s make utility functions to write IDs and numbers to the binary. 36 00:02:42.666 --> 00:02:46.166 (For now I named the functions WriteId and WriteInt.) 37 00:02:49.966 --> 00:02:54.900 From here, I will write the binary according to the WAV file format. 38 00:02:59.500 --> 00:03:03.533 (I’m writing it while looking at the WAV file specification.) 39 00:03:05.966 --> 00:03:10.066 (Also, I am converting from decimal to hexadecimal using a calculator.) 40 00:03:12.333 --> 00:03:16.599 (This time, I set it to 44100Hz, 16bit, monaural audio.) 41 00:03:18.866 --> 00:03:22.300 Finally, it’s the process to write the waveform to binary. 42 00:03:22.300 --> 00:03:27.166 I want to handle the waveform as decimals from -1.0 to 1.0, 43 00:03:27.166 --> 00:03:28.800 but this WAV file 44 00:03:28.800 --> 00:03:36.333 is formatted to be written with integers from -32768 to 32767, 45 00:03:36.333 --> 00:03:38.966 so I’ll convert it before writing. 46 00:03:39.900 --> 00:03:44.266 (I’m writing a clamping process for when the value exceeds the range...) 47 00:03:47.866 --> 00:03:50.033 For now I’ll build it once, 48 00:03:50.033 --> 00:03:53.566 and let’s see if the WAV file is properly generated. 49 00:03:54.733 --> 00:03:56.533 The build passed. 50 00:03:58.733 --> 00:04:00.266 It executed as well. 51 00:04:00.500 --> 00:04:03.733 output.wav has been generated. 52 00:04:03.733 --> 00:04:05.266 Let’s play it. 53 00:04:06.533 --> 00:04:09.500 Hmm, it’s too short to tell. 54 00:04:09.500 --> 00:04:12.733 No error occurs so the format seems correct. 55 00:04:13.199 --> 00:04:14.699 Let’s make it longer. 56 00:04:14.699 --> 00:04:22.666 The sampling frequency is 44100Hz, so I’ll make the length 44100 to get a 1-second audio. 57 00:04:23.233 --> 00:04:27.066 (Building and running it again...) 58 00:04:28.166 --> 00:04:33.000 No sound since the waveform is always 0, but it seems to play for 1 second. 59 00:04:37.033 --> 00:04:40.166 Let’s try writing some waveform into it. 60 00:04:40.166 --> 00:04:45.733 A sine wave of 440Hz is perceived by the human ear as the note A. 61 00:04:45.733 --> 00:04:51.033 Let’s write a sine wave that vibrates 440 times per second. 62 00:04:57.733 --> 00:04:59.733 If I play it, 63 00:05:01.100 --> 00:05:03.500 ah, the note A sounded! 64 00:05:03.500 --> 00:05:05.166 A success. 65 00:05:05.166 --> 00:05:08.033 I’ll save it under a different name to commemorate this. 66 00:05:10.199 --> 00:05:13.500 Next is... Oh, I forgot to free it. 67 00:05:13.500 --> 00:05:14.966 Let’s free it. 68 00:05:16.166 --> 00:05:19.733 Next, let’s make a musical scale and melody. 69 00:05:19.733 --> 00:05:22.500 First, I’ll write the notes of a melody. 70 00:05:25.133 --> 00:05:30.166 I’ll write combinations of the notes CDEFGAB and lengths. 71 00:05:30.166 --> 00:05:34.166 I’ll write 4 for a quarter note, 8 for an eighth note. 72 00:05:36.666 --> 00:05:39.266 Yes, it’s the melody of “Twinkle Twinkle Little Star”. 73 00:05:39.733 --> 00:05:45.066 The flow will be that passing this melody to a function called CreateWave will generate its waveform. 74 00:05:51.933 --> 00:05:58.066 Around here, I’ll write the process to convert the scale of the passed melody into a waveform. 75 00:05:58.066 --> 00:06:03.433 For notes, every time the frequency doubles, the pitch goes up by one octave. 76 00:06:03.433 --> 00:06:07.733 Every time it halves, the pitch goes down by one octave. 77 00:06:07.733 --> 00:06:11.800 In other words, by multiplying 440Hz by 2^n, 78 00:06:11.800 --> 00:06:15.733 we can play the A note in different octaves. 79 00:06:15.733 --> 00:06:18.133 Furthermore, within one octave, 80 00:06:18.133 --> 00:06:22.733 there are 12 notes combining the white and black keys of a piano. 81 00:06:22.733 --> 00:06:27.699 In other words, if we consider m/12 which divides an octave into 12 parts, 82 00:06:27.699 --> 00:06:32.466 by multiplying 440Hz by 2^(m/12), 83 00:06:32.466 --> 00:06:35.733 all notes on the piano keyboard can be represented. 84 00:06:35.733 --> 00:06:39.833 Since white keys are not spaced equally in semitone increments, 85 00:06:39.833 --> 00:06:45.533 m takes irregular values like 0, 2, 3, 5, 7, ... 86 00:06:49.566 --> 00:06:52.233 Oops, a build error occurred. 87 00:06:52.233 --> 00:06:54.899 The way I wrote the array was reversed... 88 00:06:57.433 --> 00:07:01.666 (Fixing...) 89 00:07:06.633 --> 00:07:08.533 Okay, let’s play it. 90 00:07:13.933 --> 00:07:15.966 Twinkle Twinkle Little Star played perfectly. 91 00:07:15.966 --> 00:07:17.966 I’ll save it under a different name. 92 00:07:20.866 --> 00:07:23.733 Next up, let’s make some instruments. 93 00:07:23.733 --> 00:07:27.733 I want to play sounds other than sine waves too, after all. 94 00:07:27.733 --> 00:07:31.733 I’ll create new files synthe.h and synthe.c. 95 00:07:33.699 --> 00:07:37.899 (.h and .c are files for C language.) 96 00:07:37.899 --> 00:07:42.366 (I’ll write function definitions in .h, and implementations in .c.) 97 00:07:44.899 --> 00:07:47.899 Besides the sine wave, let’s also add a triangle wave. 98 00:07:47.899 --> 00:07:51.966 A triangle wave is a brighter, more prominent sound than a sine wave. 99 00:07:53.233 --> 00:07:55.466 (A sine wave is a smooth wave, but) 100 00:07:55.466 --> 00:07:59.233 (a triangle wave is literally a wave shaped like connected triangles.) 101 00:07:59.733 --> 00:08:01.933 Okay, the triangle wave is completed. (Probably) 102 00:08:10.333 --> 00:08:12.699 Let’s add various other instruments too. 103 00:08:13.300 --> 00:08:16.733 I don’t know if it’ll work, but let’s challenge making a piano. 104 00:08:16.733 --> 00:08:19.733 A piano is an instrument with many overtones. 105 00:08:19.733 --> 00:08:23.500 I’ll represent overtones by combining cosine waves. 106 00:08:23.966 --> 00:08:27.366 (Overtones are sounds layered at 2x, 3x... frequencies.) 107 00:08:29.100 --> 00:08:34.733 Here, let’s implement basic synthesizer functions for ADSR. 108 00:08:34.733 --> 00:08:39.899 ADSR stands for Attack, Decay, Sustain, and Release, 109 00:08:39.899 --> 00:08:43.033 and lets you control a sound’s duration and how it plays. 110 00:08:43.033 --> 00:08:45.333 For example, when you play a piano note, 111 00:08:45.333 --> 00:08:49.733 it starts with a loud sound and then naturally fades. 112 00:08:49.733 --> 00:08:51.733 On the other hand, when playing a violin, 113 00:08:51.733 --> 00:08:54.799 it takes slight time to reach max volume, 114 00:08:54.799 --> 00:08:57.500 but you can keep sounding the note. 115 00:08:57.500 --> 00:09:00.000 In this way, when playing an instrument, 116 00:09:00.000 --> 00:09:03.933 ADSR attempts to replicate changes in volume. 117 00:09:05.799 --> 00:09:07.966 (Actually the decay order from low to high notes was reversed.) 118 00:09:07.966 --> 00:09:10.200 (Without this, it won’t sound like a piano...) 119 00:09:12.100 --> 00:09:14.733 By slightly shifting overtone frequencies, 120 00:09:14.733 --> 00:09:18.766 it supposedly reproduces piano string rigidity, so I’ll try it. 121 00:09:29.066 --> 00:09:33.366 I doubt it sounds like a piano, but a piano-like sound is ready anyway. 122 00:09:35.366 --> 00:09:37.266 Next is Lead. 123 00:09:37.266 --> 00:09:39.833 I’ll represent a square wave using a sine function. 124 00:09:40.266 --> 00:09:44.233 (A square wave is a wave that looks like a row of rectangles.) 125 00:09:44.899 --> 00:09:48.566 Normally, you’d need an infinite layering of overtones, 126 00:09:48.566 --> 00:09:51.100 but limiting overtones to about four 127 00:09:51.100 --> 00:09:53.733 gives roundness to the square wave’s tone. 128 00:09:53.733 --> 00:09:56.866 Then, I’ll layer two notes one octave apart, 129 00:09:56.866 --> 00:10:00.200 and slightly shift frequencies to add thickness. 130 00:10:01.566 --> 00:10:05.766 (This is what’s known as a Detune effect.) 131 00:10:06.366 --> 00:10:08.733 Let’s also apply vibrato while we’re at it. 132 00:10:08.733 --> 00:10:12.933 By letting the pitch wobble gradually for sustained notes, 133 00:10:12.933 --> 00:10:15.899 we get a sound simulating a human singing vibrato. 134 00:10:26.366 --> 00:10:28.399 Alright, Lead is also completed. 135 00:10:30.600 --> 00:10:32.233 Lastly, Pad. 136 00:10:32.233 --> 00:10:36.500 For Pad, Attack is small to give a soft, floating sound. 137 00:10:36.500 --> 00:10:39.433 Playing just one note triggers a chord play. 138 00:10:53.633 --> 00:10:55.299 Okay, Pad is finished too. 139 00:10:57.966 --> 00:11:00.833 Let’s make it possible to change the volume per instrument. 140 00:11:01.233 --> 00:11:04.733 (Going around adding 'volume' arguments to each instrument.) 141 00:11:05.133 --> 00:11:09.733 Now that we have all instruments, we’ll move on to composing. 142 00:11:09.733 --> 00:11:14.733 Right now, only 7 notes CDEFGAB are available, 143 00:11:14.733 --> 00:11:18.100 so I’ll implement octave switching and black keys. 144 00:11:18.100 --> 00:11:19.533 Also supporting rests. 145 00:11:23.066 --> 00:11:26.066 (I’ll use “-” for one semitone down, and “+” for one up.) 146 00:11:29.166 --> 00:11:33.066 Ok then, I’ll track out the song while creating it in my head. 147 00:11:33.833 --> 00:11:37.866 (Composing mentally...) 148 00:11:38.200 --> 00:11:40.066 The melody is something like this. 149 00:11:42.166 --> 00:11:45.166 (Creating the 2nd part...) 150 00:11:45.766 --> 00:11:47.933 Next, I’ll track the bass line. 151 00:11:51.466 --> 00:11:54.000 I’ll reinforce chord tones with Pad. 152 00:11:54.399 --> 00:11:57.000 (Chords are important...) 153 00:11:57.333 --> 00:12:00.466 Let’s add a blippy sound in the high register. 154 00:12:00.466 --> 00:12:02.166 It makes the tune sparkle. 155 00:12:03.633 --> 00:12:07.966 (Mostly adhering to chords,) 156 00:12:07.966 --> 00:12:12.333 (I sprinkle arpeggios and scales.) 157 00:12:13.966 --> 00:12:17.299 Finally, let’s add piano in the empty mid-low range. 158 00:12:18.000 --> 00:12:19.899 (It’s best to assign sounds) 159 00:12:19.899 --> 00:12:21.933 (into empty frequency spaces.) 160 00:12:22.833 --> 00:12:25.933 In its current state, multiples waveforms combined 161 00:12:25.933 --> 00:12:30.799 exceed the range of -1.0 to 1.0. 162 00:12:30.799 --> 00:12:33.600 Right now, those exceeded parts are being clamped, 163 00:12:33.600 --> 00:12:37.000 meaning the waveform is heavily broken down into noise. 164 00:12:37.000 --> 00:12:41.100 Hence, I am going to normalize it overall 165 00:12:41.100 --> 00:12:42.766 to fit within the -1.0 to 1.0 range. 166 00:12:43.466 --> 00:12:45.233 By dividing by the peak absolute value, 167 00:12:45.233 --> 00:12:49.666 the waveform gets scaled to the -1.0 to 1.0 range. 168 00:12:54.533 --> 00:12:58.233 Just in case, I’ll output the peak value to the console. 169 00:13:12.500 --> 00:13:15.533 Yes, an adequate song is complete. 170 00:13:15.533 --> 00:13:20.266 So, I want to apply a delay to give depth to the sound. 171 00:13:20.266 --> 00:13:23.100 Delay is pretty much a kind of echo. 172 00:13:23.100 --> 00:13:25.733 By repeatedly playing sounded notes with a time lag, 173 00:13:25.733 --> 00:13:28.266 it brings an effect much like a reverb. 174 00:13:29.200 --> 00:13:31.633 The part extending until the echo finishes 175 00:13:31.633 --> 00:13:35.433 extends the song’s overall length, so we’ll update song length too. 176 00:13:39.766 --> 00:13:43.866 Oh shoot..., if echo ends up being written on the main track portion, 177 00:13:43.866 --> 00:13:49.233 it is seen as part of the main, creating infinite echoes off echoes. 178 00:13:49.233 --> 00:13:52.066 To avoid this, you process from the end 179 00:13:52.066 --> 00:13:55.566 towards the beginning in reverse order, rather than starting from front to back. 180 00:13:56.366 --> 00:13:59.200 Echoes don’t just have to happen once, 181 00:13:59.200 --> 00:14:02.633 I’m making it several with time and volume variations. 182 00:14:03.799 --> 00:14:06.799 (By the way, the current BGM playing is also self-composed.) 183 00:14:07.133 --> 00:14:09.166 I’ll adjust a bunch of stuff while I’m at it. 184 00:14:25.533 --> 00:14:28.566 Nice, the delay smoothed it out quite well. 185 00:14:29.433 --> 00:14:32.133 I’d like to add some overdrive to the piano, 186 00:14:32.133 --> 00:14:34.166 just to gently distort it. 187 00:14:34.166 --> 00:14:36.733 Applying processing asymmetrically 188 00:14:36.733 --> 00:14:38.866 to positive and negative waveform limits 189 00:14:38.866 --> 00:14:41.966 supposedly generates a distinct, rich effect! 190 00:14:41.966 --> 00:14:43.066 Let’s try it. 191 00:14:44.100 --> 00:14:45.399 For continuity, 192 00:14:45.399 --> 00:14:48.466 the tanh function is often used. 193 00:14:49.933 --> 00:14:52.633 Because I also wanted to tweak various balances, 194 00:14:52.633 --> 00:14:53.700 I’m fine-tuning it. 195 00:14:54.899 --> 00:14:59.466 Right now I’m normalizing waveforms from -1.0 to 1.0, 196 00:14:59.466 --> 00:15:02.100 but that shrinks the overall waveform 197 00:15:02.100 --> 00:15:04.200 and makes it sound quieter. 198 00:15:04.200 --> 00:15:07.733 So I’ll try fitting everything within -1.0 to 1.0 199 00:15:07.733 --> 00:15:11.200 while keeping the original shape intact as much as possible. 200 00:15:11.766 --> 00:15:15.033 This is where compressors and limiters step in. 201 00:15:15.733 --> 00:15:18.933 A compressor performs a function to only compress parts of the waveform 202 00:15:18.933 --> 00:15:21.933 when it exceeds a certain limit. 203 00:15:21.933 --> 00:15:25.733 Instead of scaling the entirety via normalization, 204 00:15:25.733 --> 00:15:29.733 you can picture this as just scaling the occasionally jutted-out parts! 205 00:15:29.733 --> 00:15:32.766 Doing this boosts perceptual loudness quite nicely. 206 00:15:33.966 --> 00:15:37.866 (Applying compressors adds an aggressive cohesive punch to the track,) 207 00:15:37.866 --> 00:15:42.000 (so if you haven’t used them, definitely give them a try!) 208 00:15:42.000 --> 00:15:46.500 (Everyone’s using them.) 209 00:15:48.100 --> 00:15:53.533 You can think of a limiter as having a much higher compressor ratio applied. 210 00:15:53.533 --> 00:15:56.733 The compression ratio can even be maxed at infinity. 211 00:15:56.733 --> 00:15:59.299 This time, when the wave overtakes a certain level, 212 00:15:59.299 --> 00:16:01.833 I’ll just have it simply clip right there! 213 00:16:01.833 --> 00:16:04.899 It causes noise if overdone, so watch out. 214 00:16:08.933 --> 00:16:12.600 Alright, so the final form is complete. Let’s listen. 215 00:16:23.566 --> 00:16:25.233 If we want to use an equalizer, 216 00:16:25.233 --> 00:16:28.166 we’d also need an implementation of the Fast Fourier Transform, 217 00:16:28.166 --> 00:16:30.733 but I’ll call it finished for now. 218 00:16:32.399 --> 00:16:35.033 For everyone out there, if you’re stranded on a deserted island, 219 00:16:35.033 --> 00:16:38.566 you won’t be able to use rich environments like VSCode or Python. 220 00:16:38.566 --> 00:16:41.533 Definitely give creating WAV files from scratch a try! 221 00:16:42.166 --> 00:16:43.533 Bye-bye!