We determined multiple ways in which a user might wish to enter a story. One could either simply type a story, or paste one from the internet. One could also want to take a picture out of a storybook, upload a text file, or even narrate. We implemented all these options for the user to enter their story using popular speech recognition, OCR, and text extracting APIs. The design of this screen is essentially a text editor to place all the emphasis on the task at hand. In addition, we provided Madaari with its own database of stories for users to select from saving them the trouble of browsing through external sources to get started.
To increase the usability of a new system like Madaari, we provided a wizard tutorial as a quick walkthrough guide. Later, we use the same wizard modal during the programming step to succintly describe the functionalities Madaari provides.
Programming using a block-based environment
This step enables a user to be creative and connect the elements of multimedia storytelling with their chosen story. To accomplish the task of lowering the barrier of developing such complex modules, while providing programming-like freedom, we took inspiration from block-based visual programming. Tools like MIT's Scratch, Blockly, and other block-based authoring tools are often used to help novice programmers and children utilize programming capabilities in an interactive, visual construct.
Designing blocks and their functions
We represent output capabilities as action blocks in the programming space while the sensing capabilities are cue blocks. Control blocks like if-then and repeat, are provided to connect the action and cue blocks together and extend control over actions respectively. Further, functionalities like programming a sequential set of actions or combining two cues together are covered through apparent blocks such as 'wait for x seconds' and 'or' cue block. Despite its ease, certain blocks like animation, music, and robot are accompanied by panels that provide a better preview to test actions. All of this is tied together to the story that is always accessible as a reference and can be used to generate keywords as cues.
The user experience of programming through blocks
We ensured that the user experience was designed in a way to correctly guide the user, preventing possible errors instead of correcting them. For example, the user could not drag an action block into the if condition space or use blocks with blank values. Alongside, a compiler-like check was implemented to filter logical errors before proceeding to the reading environment.
We also provided subtle feedback responses for the user to their actions. For example, adding a keyword cue in the programming space highlights that words in the reference story. Generating a word from the helper panel puts emphasis on a newly generated block through a glow animation reinforcing the success of their action.
How was this made?
Reading the Story
Finally, the user can experience the digital storytelling experience that s/he programmed using Madaari. Since we incorporated multiple modalities, the reading mode is designed to be dynamic. First, the user receives a prompt to check the connected devices based on the modalities used in the program. Next, the user observes the reading screen with the story written in a large readable font. The keywords on the story are highlighted according to the program and light up when the particular word is said to indicate the detection of the cue. Other dynamic feedback windows such as animation preview, expression detector displaying the live feed from the front camera are placed in a collapsible panel on the right. We designed the reading screen such that the user is comfortable reading the story through their laptop, tablet or even a storybook with Madaari running on the side. It implements a dark mode for night time stories for a kid by a parent. The user can also enable and control the speed for autoscroll while reading.
We use the WebSpeechAPI to detect speech and Affectiva's canvas based API implementation for expression detection. The real-time results from these are matched with the programmed cues loaded from Firebase and the corresponding actions are executed if the match is true.