C++ wcstok()

The wcstok() function in C++ returns the next token in a null terminated wide string.

The wcstok() function is defined in <cwchar> header file.

wcstok() prototype

wchar_t* wcstok( wchar_t* str, const wchar_t* delim, wchar_t ** ptr);

The wcstok() function takes three arguments: str, delim and ptr. This function finds the token in the wide string pointed to by str. The pointer delim points to the separator characters.

This function can be called multiple times to obtain tokens from the same wide string. There are two cases:

  1. If str is not NULL:
    A call to wcstok() is considered first call for that wide string. The function searches for the first wide character that is not contained in delim. If no such wide character is found, the wide string does not contain any token. So a null pointer is returned.
    If such wide character is found, from there on the function searches for a wide character that is present in delim. If no separator is found, str has only one token. If a separator is found, it is replaced by L'\0' and the pointer to the following character is stored in *ptr. Finally, the function returns the pointer to the beginning of the token.
  2. If str is NULL:
    The call is considered as subsequent calls to wcstok and the function continues from where it left in previous invocation with the same *ptr.

wcstok() Parameters

  • str: Pointer to the null terminated wide string to tokenize.
  • delim: Pointer to the null terminated wide string that contains the separators.
  • ptr: Pointer to a pointer to a wide character which is used by wcstok to store its internal state.

wcstok() Return value

  • The wcstok() function returns the pointer to the beginning of next token if there is any
  • It returns NULL if no more tokens are found.

Example: How wcstok() function works?

#include <cwchar>
#include <clocale>
#include <iostream>
using namespace std;

int main()
{
	setlocale(LC_ALL, "en_US.utf8");
	
	wchar_t str[] = L"parrot,owl,sparrow,pigeon,crow";
	wchar_t delim[] = L"\u002c";// unicode for comma
	wchar_t *ptr;
	
	wcout << L"The tokens are:" << endl;
	wchar_t *token = wcstok(str,delim,&ptr);
	
	while (token)
	{
		wcout << token << endl;
		token = wcstok(NULL,delim,&ptr);
	}
	
	return 0;
}

When you run the program, the output will be:

The tokens are:
parrot
owl
sparrow
pigeon
crow