Quantcast
Channel: Extracting a specific substring in C - Code Review Stack Exchange
Viewing all articles
Browse latest Browse all 4

Answer by Lundin for Extracting a specific substring in C

$
0
0

As far as performance goes, there isn't much you can improve without doing manual optimization tricks. Such things are already implemented in the library functions though.

The main issue I see here is that you copy data into the destination before you know if the string actually contains a termination character. By doing so, you save a bit of time as you can copy and search at the same time. But you also end up copying data before the input has been verified.

What's best in your case, I don't know. It depends on how reliable your input is. If you have already verified it previously, then your copy+check in one might be the best choice. If it's some raw data from a serial bus (UART etc), it might be wisest to verify the data before you copy. I will show a version that does the verification first, it will be safer although possibly slightly slower than what you currently have.


General code review:

Style/best practices

  • Pointer parameters to data that isn't modified should be const qualified.
  • Using a plain int for error handling isn't ideal. You actually have several possible errors here: missing search string, missing terminator, potential buffer overflow. Even if your program doesn't need to know what went wrong, it might ease debugging and it costs you nothing extra to add.
  • The while loop could have been replaced with a for loop, for better readability:

    for(size_t i=0; i<desLen-1; i++){  if(temp[i]==termChar)  {    break;  }  dst[i]=temp[i];}

Performance

  • The length of the search key could be determined at compile-time.
  • Consider dropping the terminating character parameter if what2find[0] could be said to always contain it.
  • Some micro-optimizations with C99 restrict are possible. I'll show an example below.

Here is a different version which contains more detailed error handling and checks for termination before copy:

#include <string.h>#include <stdbool.h>typedef enum{  STRFIND_OK,  STRFIND_KEY_NOT_FOUND,  STRFIND_TERMINATOR_NOT_FOUND,  STRFIND_BUFFER_OVERFLOW,} strfind_result_t;strfind_result_t strfind_cpy (const char* str,                               size_t      key_size,                              const char  key[key_size],                               char        terminator,                              size_t      dst_size,                               char        dst [dst_size]){  char* start = strstr(str, key);  if(start == NULL)  {    return STRFIND_KEY_NOT_FOUND;  }  start += key_size-1;  char* end = strchr(start, terminator);  if(end == NULL)  {    return STRFIND_TERMINATOR_NOT_FOUND;  }  size_t length = (size_t)(end-start);  if(length+1 > dst_size)  {    return STRFIND_BUFFER_OVERFLOW;  }  memcpy(dst, start, length);  dst[length] = '\0';  return STRFIND_OK;}

C99 pointer-to-VLA are used to ensure data size integrity of the buffers. If the verification of data isn't needed, strchr could be replaced with a for loop like the one demonstrated above.

memcpy is the fastest possible copy. It will be faster than copy character-by-character, since the library implementation will work on 32 bit chunks that your ARM likes better than copying individual bytes. So my code might actually be faster (or it may be slower), you'll have to benchmark it.

Further micro-optimization is possible with C99 restrict:

strfind_result_t strfind_cpy (const char* restrict  str,                               size_t                key_size,                              const char* restrict  key,                               char                  terminator,                              size_t                dst_size,                               char* restrict        dst);

This tells the compiler that none of the pointers passed point at the same buffer. This may improve performance ever so slightly, depending on how your compiler handles pointer aliasing internally. Check the disassembled code to see if restrict gave any benefits.


Example of use:

#include <stdio.h>int main (void){  const char data[] = "blablabla$GPTXT->SOME CODES HERE<-\r\n$GPRMCblablabla";  char buf[50];  strfind_result_t result;  result = strfind_cpy(data,                       sizeof "$GPTXT","$GPTXT",'$',                       sizeof buf,                       buf);  if(result == STRFIND_OK)  {    puts(buf);  }}

Viewing all articles
Browse latest Browse all 4

Trending Articles