Skip to content

Overview

An R-package for analyzing natural language with transformers-based large language models. The talk package is part of the R Language Analysis Suite, including talk, text and topics.

  • talk transforms voice recordings into text, audio features, or embeddings.

  • text provide many language tasks such as converting digital text into word embeddings.

    talk and text provide access to Large Language Models from Hugging Face.

  • topics visualizes language patterns into topics to generate psychological insights.



The R Language Analysis Suite is created through a collaboration between psychology and computer science to address research needs and ensure state-of-the-art techniques. The suite is continuously tested on Ubuntu, Mac OS and Windows using the latest stable R version.

Short installation guide

Most users simply need to run below installation code. For those experiencing problems, please see the Extended Installation Guide.

For the talk-package to work, you first have to install the talk-package in R, and then make it work with talk required python packages.

  1. Install talk-version (at the moment the second step only works using the development version of talk from GitHub).

GitHub development version:

# install.packages("devtools")
devtools::install_github("theharmonylab/talk")
  1. Install and initialize talk required python packages:
library(talk)

# Install talk required python packages in a conda environment (with defaults).
talkrpp_install()

# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run talkrpp_initialize() after restarting R. 
talkrpp_initialize(save_profile = TRUE)

Point solution for transforming talk to embeddings

Recent significant advances in NLP research have resulted in improved representations of human language (i.e., language models). These language models have produced big performance gains in tasks related to understanding human language. talk are making these SOTA models easily accessible through an interface to HuggingFace in Python.

See HuggingFace for a more comprehensive list of models.

The talkText() function performs speech-to-text, transcribing audio input to text. talkEmbed(), transforms audio input to numeric representations (embeddings) that can be used for downstream tasks such as guideline predictive models using the text-package (see the text train functions).

library(talk)
# Transform the talk data to BERT word embeddings

# Get file path to example audio from the package example data
wav_path <- system.file("extdata/",
                            "test_short.wav",
                            package = "talk")

# Get transcription 
talk_embeddings <- talkText(
  wav_path
)
talk_embeddings

# Defaults
talk_embeddings <- talkEmbed(
  wav_path
)
talk_embeddings

GitHub