May 10, 2023

                                                                   [Karanjot Singh](<https://www.linkedin.com/in/karanjotsinghv/>)

Sat Sri Akaal!

I am a data scientist based in Mohali, India with 2+ years of experience working in various sub-domains of natural language processing. Before joining, I graduated from Guru Nanak Dev Engineering College, Ludhiana in 2020, majoring in Information Technology. You can reach out to me here.

I am super excited to be a part of Google Summer of Code '23 with The Distributed Little Red Hen Lab. I am looking forward to learning more about multimodal deep learning and how it can be used to detect the stance of a TV news story. I am fortunate to have Dr. Homa Hosseinmardi as my mentor for this project.

RedHenLab.png

Multimodal Stance Detection on Television News

As the viewership of American cable news stations continues to increase, it has become increasingly important to understand the stance of news stories related to sensitive topics curated by these stations, as well as the demographics of their viewership. Recent studies have shown how minor verbal tweaks can lead individuals to believe that a certain community is socially distant and indifferent, highlighting the need to detect any underlying bias.

Details of the proposal can be found here.

Community Bonding Period

The main objectives:

  1. Set up VPN and gain access to the lab's High Performance Cluster (HPC).
  2. Reiterate the established project timeline and synchronize with mentors.
  3. Attend the GSoC '23 welcome meeting and review the shared documentation.
  4. Get started with Payoneer, set up the account and complete necessary actions.
  5. Gain access to the TV News dataset and check labels, if any.

1200px-GSoC-icon.svg.png

Week 1

To start with, objectives for the week:

  1. Collect news stories on any topic among: gun control, climate change, immigration or abortion. Different news outlets: Fox News Channel (FNC), Microsoft and the National Broadcasting Company (MSNBC), etc.
  2. Fetch transcription with time-stamps of the stories with OpenAI’s Whisper
  3. Segment the stories based on different kinds of audio signals: speech, noise, music and noEnergy.
  4. Plot distributions of segment length for audio signals

The aim is to validate if the speech segments can be used as chunks for topic detection, these topics be the targets for stance detection

speech.png