Things used in this project
Story
Introduction
If you’ve been a parent for longer than 24 hours, you know the expression “sleep like a baby” is just another lie. Babies don’t sleep soundly, like, at all. They grunt and groan, cough and sigh, scooch and squirm and wiggle. Everything is fun and games until your sweet little one wakes up crying in the middle of night on a daily basis. Babies have all the liberty to wake-up and sleep whenever they want, but often their parents don't have this. This is where E-nanny comes to rescue.
Working
E-nanny is based on a Keyword Spotting (KWS) algorithm. I know what you are thinking right now. Isn't an audio classification problem rather than KWS. I also thought the same first, but it seems KWS give more accuracy. Perhaps babies cry have similar rhythm of our natural languages. Once the "event" is detected the device will initiate a series of "counter measures" to put the baby back to sleep before the parent wake up. Beware! If the baby wakes up due to hunger or say dry napkin there is nothing my device can do and you'll have to put him back to sleep yourself. But babies more often due to far silly reasons, say a nightmare, so the device will be working in most cases.
Step 1: Collecting Data
For the Audio corpus I have completely relied on opensource sound databases like FreeSound. All audio I have used is copyright free and will be available in my GitHub page. I have collected 3 types of audio. They are:
1. Cry
2. Cough
3. Noise (which I recorded my self)
All in all, I am having around 15 mins of data. (10 mins cry, 5 mins noise, and a miniscule amount of cough)
Step 2: Cleaning Data
I am using Audacity for cleaning. Since I have mainly used free audio that was recorded by someone else I made sure that no unwanted sound is included and cropped it whenever I found such cases. Also leaving long pauses between crying (babies cry like that) will create confusion for the NN. So I have also removed long pauses if anywhere present. Amplify the audio if the amplitude is weak. Then I have exported all the audio into wav files at 16kHz sampling rate (16kHz is required by Edgeimpulse for KWS). Finally I renamed the audios in the format "label.number.wav" where label can be Cry, Cough or Noise. This is done to make labelling easy in the future inside edge impulse.
Before:
Before cleaning
After:
After cleaning
Step 3: Edge Impulse
Now it's time for the interesting part, training your model. Since there is good amount of material on this part in Edgeimpulse site itself, I won't dive deep into explanation. If you are a beginner seeing this video will be helpful.
After creating a new project upload your audio corpus from last step.
Start by create a new impulse as following
create impulse
Now got MFE option and generate feature
Generate Feature
Now it's time to train our model
Training
Ok, training completed and our model looks promising as data is neatly clustered. The accuracy seems not great. The main issue is cough that have miniscule amount of data. I'll try to improve the accuracy on the next part of the project.
Finally, let's deploy our model. For that download it is an Arduino library.
Deploy
Step 4: Preparing Hardware and Firmware
I am using an Esp32 and I2S microphone. But any 32 bit microcontroller and microphone will work. For interfacing inmp441 with esp32 I have taken reference of atomic14's code.
In platformIO create a new project based on any ESP32 boards. After project is created you have to unpack the zip file you downloaded from Edgeimpulse to the lib folder. (My model is attached below). The final the code is also attached below.
E-nanny in Action
I am playing a random YouTube video of baby crying on my laptop when inference was running on ESP32. As ESP32 detects baby crying it turns on the on board LED (blue LED)
Code
- Esp32 platformio code
- E-nanny trained model
Esp32 platformio code
C/C++
#include <Arduino.h>#include <driver/i2s.h>#include <E-nanny_inferencing.h>/*************** I2C Macros ***************/// you shouldn't need to change these settings#define SAMPLE_BUFFER_SIZE 512#define SAMPLE_RATE 16000// most microphones will probably default to left channel but you may need to tie the L/R pin low#define I2S_MIC_CHANNEL I2S_CHANNEL_FMT_ONLY_LEFT// either wire your microphone to the same pins or change these to match your wiring#define I2S_MIC_SERIAL_CLOCK GPIO_NUM_32#define I2S_MIC_LEFT_RIGHT_CLOCK GPIO_NUM_25#define I2S_MIC_SERIAL_DATA GPIO_NUM_33/*************** Other Macros ***************/#define LED_BUILTIN 2/*************** Global variables ***************/uint8_t ret = -1;int16_t raw_samples[SAMPLE_BUFFER_SIZE];float features[EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE];int indx = 0;// don't mess around with thisi2s_config_t i2s_config = { .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX), .sample_rate = SAMPLE_RATE, .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT, .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT, .communication_format = I2S_COMM_FORMAT_I2S, .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1, .dma_buf_count = 2, .dma_buf_len = 1024, .use_apll = false, .tx_desc_auto_clear = false, .fixed_mclk = 0};// and don't mess around with thisi2s_pin_config_t i2s_mic_pins = { .bck_io_num = I2S_MIC_SERIAL_CLOCK, .ws_io_num = I2S_MIC_LEFT_RIGHT_CLOCK, .data_out_num = I2S_PIN_NO_CHANGE, .data_in_num = I2S_MIC_SERIAL_DATA};void setup(){ // we need serial output for the plotter Serial.begin(115200); // start up the I2S peripheral ret = i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL); if(ret) Serial.printf("[Error] %d",ret); ret = i2s_set_pin(I2S_NUM_0, &i2s_mic_pins); if(ret) Serial.printf("[Error] %d",ret); pinMode(LED_BUILTIN,OUTPUT); }void loop(){ // read from the I2S device size_t bytes_read = 0; i2s_read(I2S_NUM_0, raw_samples, sizeof(int16_t) * SAMPLE_BUFFER_SIZE, &bytes_read, portMAX_DELAY); int samples_read = bytes_read / sizeof(int16_t); for(int i=0; i<samples_read; i++) { features[i+indx] = float(raw_samples[i]); } indx += samples_read; if(indx >= EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE) { // start inferencing ei_impulse_result_t result; // create signal from features frame signal_t signal; numpy::signal_from_buffer(features, EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE, &signal); // run classifier EI_IMPULSE_ERROR res = run_classifier(&signal, &result, false); //ei_printf("run_classifier returned: %d\n", res); if (res != 0) return; //**************** // print predictions ei_printf("Predictions (DSP: %d ms., Classification: %d ms., Anomaly: %d ms.): \n", result.timing.dsp, result.timing.classification, result.timing.anomaly); // print the predictions for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) { ei_printf("%s:\t%.5f\n", result.classification[ix].label, result.classification[ix].value); } //*****************/ if(result.classification[0].value>0.75) //Label Cry digitalWrite(LED_BUILTIN,HIGH); else if(result.classification[2].value>0.75)//Label Noise digitalWrite(LED_BUILTIN,LOW);#if EI_CLASSIFIER_HAS_ANOMALY == 1 ei_printf("anomaly:\t%.3f\n", result.anomaly);#endif indx = 0; }}
E-nanny trained model
C/C++
No preview (download only).
Credits
nafihahmd
4 projects • 2 followers