To implement the C function ‘strtok’, one doesn’t need to allocate extra memory for the input string before modifying it, since the caller function is supposed to make copy of the input string and deallocate the original input string. The function only needs to define static variables to store the starting of the string.
The caller function provides a string as the input during the first call, and ‘null’ for the following function calls. For details about how this function works, please refer to http://www.cplusplus.com/reference/cstring/strtok/.
char* sp = NULL; /* the start position of the string */ char* strtok1(char* str, const char* delimiters) { int i = 0; int len = strlen(delimiters); /* check in the delimiters */ if(len == 0) printf("delimiters are empty\n"); /* if the original string has nothing left */ if(!str && !sp) return NULL; /* initialize the sp during the first call */ if(str && !sp) sp = str; /* find the start of the substring, skip delimiters */ char* p_start = sp; while(true) { for(i = 0; i < len; i ++) { if(*p_start == delimiters[i]) { p_start ++; break; } } if(i == len) { sp = p_start; break; } } /* return NULL if nothing left */ if(*sp == '\0') { sp = NULL; return sp; } /* find the end of the substring, and replace the delimiter with null */ while(*sp != '\0') { for(i = 0; i < len; i ++) { if(*sp == delimiters[i]) { *sp = '\0'; break; } } sp ++; if (i < len) break; } return p_start; }
This following is the test code from cplusplus.com
.
#include <stdio.h> #include <string.h> int main () { char str[] ="- This, a sample string."; char * pch; printf ("Splitting string \"%s\" into tokens:\n",str); pch = strtok1(str," ,.-"); while (pch != NULL) { printf ("%s\n",pch); pch = strtok1(NULL, " ,.-"); } return 0; }
————————-
As pointed out by Ian in the following comments [9.25.2014], there is a much simpler solution to this problem. Feel free to test it and let us know if it has any bugs.
char* strtok1(char *str, const char* delim) { static char* _buffer; if(str != NULL) _buffer = str; if(_buffer[0] == '\0') return NULL; char *ret = _buffer, *b; const char *d; for(b = _buffer; *b !='\0'; b++) { for(d = delim; *d != '\0'; d++) { if(*b == *d) { *b = '\0'; _buffer = b+1; // skip the beginning delimiters if(b == ret) { ret++; continue; } return ret; } } } return ret; }
I may be over simplifying this but it seems this approach is over complicated. Here is my strtok implementation. I have done no edge case analysis but has served me well for a few years.
4 char* strtok(char *str, const char* delim) {
5 static char* _buffer;
6 if(str != NULL) _buffer = str;
7 if(_buffer[0] == ”) return NULL;
8
9 char *ret = _buffer, *b;
10 const char *d;
11
12 for(b = _buffer; *b != ”; b++) {
13 for(d = delim; *d != ”; d++) {
14 if(*b == *d) {
15 *b = ”;
16 _buffer = b+1;
17 return ret;
18 }
19 }
20 }
21
22 return ret;
23 }
Ian, thanks for sharing your solution. It looks very neat. But there is some difference when comparing your output and mine using the above mentioned test case. Your output is:
Splitting string “- This, a sample string.” into tokens:
This
a
sample
string
while mine is
Splitting string “- This, a sample string.” into tokens:
This
a
sample
string
You will immediately notice that your solution returns multiple ” sequences. Like you said, you didn’t deal with edge cases such as two consecutive delimiters or strings beginning with delimiters. Under some situations, maybe it is not a problem.
I tested both functions on ‘http://www.compileonline.com/compile_cpp_online.php’ and replaced ” with ‘\ 0’.
Great! Thank you for taking the time to share this with me. Looks like i will have to do a bit of tweaking on my solution.
I added this between line 16 and 17. Thanks again for the input!!
if(b == ret) { ret++; continue; }
Ian, thanks for providing this smart fix. I also put your fixed solution into the original post, hope you don’t mind.
Not at all, Thanks for your feed back. You probably just saved me some from odd behavior on my current project.
Pingback: Write a strtok function | juliachenonsoftware
The ‘much simpler solution’ has a problem when the string ends with a new line
char str[] =”- This, a sample string.\n”;
gets stuck in an infinite loop.
Same infinite loop problem with the original solution when the string ends with one of the delimiters and/or has a series of delimiters before last token.
May be there needs
_buffer = b+1 before the second return as well.
i think it should return NULL at the second return and make _buffer point to b before return.
Hi,
To be able to use the code with out dot in the end of the sting. I just did small edit after debugging on this site pythontutor.com/c.html#mode=display with codeblock
char* strtok1(char *str, const char* delim) {
static char* _buffer;
if(str != NULL) _buffer = str;
if(_buffer[0] == ‘\0′) return NULL;
char *ret = _buffer, *b;
const char *d;
for(b = _buffer; *b !=’\0’; b++) {
for(d = delim; *d != ‘\0’; d++) {
if(*b == *d) {
*b = ‘\0’;
_buffer = b+1;
// skip the beginning delimiters
if(b == ret) {
ret++;
continue;
}
return ret;
}
}
}
_buffer = b;
return ret;
}
I added _buffer = b; before the last return. Because the code will reach this statement only when there is no delimiter matched in the condition which mean the string reach it ends so we can use _buffer as a flag in this case to infrom the function during the next call (after printing the last token) that the function read all the token and there is no another token to split so it is time to active this condition again
if(_buffer[0] == ‘\0’) return NULL;
and terminate the fucntion with the loop.