This is a couple of working examples plus advice, if not what you are looking for then, in summary, don't bother...
I recommend you follow the various paths, and repeat that on a periodic basis. Professionally you want to be able to cover the range from just the datasheets/schematics up to canned libraries from the vendors and up to freertos or other solution. My personal preference is datasheets/schematics I find this takes less time and is cleaner, faster, more reliable, etc - YMMV. First rule is the documentation is buggy, second the libraries are buggy (and scary if you look inside).
Some vendors do a good job relatively some tend to do not that great of a job, with time you may find out for yourself which vendors you like, not that you would professionally be able to dictate one over another until well down the road, but for fun as a hobby can certainly do that. So far I am not talking about arm vs 8051 vs other, they all have this datasheet vs library, buggy docs, etc. Anyway, there are times where you may have to dig into the library or various online open source examples to find the missing enable bit not documented anywhere, find out there may be an order in which some registers have to be programmed, which someone else may have had inside knowledge or just dumb luck.
So I'll just post a somewhat minimal example that should work with whatever flavor of arm-whatever-gcc/as/ld you have. (arm-linux-gnueabi arm-none-eabi, etc). And/or you can just build your gnu tools from sources as I do periodically. Best start with some pre-built though worry about one issue at a time.
The board you have shown in your picture also has particular community type name, its the "blue pill" or "stm32 blue pill". And unlike some other inexpensive boards you can find on eBay or other this one has had community work done to fit into the Arduino sandbox, yet another avenue to pursue for knowledge and experience, taking that Arduino path you don't need to read much of anything other than google stm32 blue pill, take some very generic few lines of code and take them to their sandbox and you have a blinking led.
You are going to want/need openocd with stlink support, so either find a binary or build from sources, its pretty easy. My preference is to take the config files from the tcl directory and carry them with my project and modify as needed rather than hope they are there and have not changed from one version of openocd to another or from one machine I am on to another. YMMV, this is very much a personal preference thing. For example I use an el-cheapo jlink clone, a few bucks on eBay (purple board). And have a jlink.cfg
interface jlink
transport select swd
The board in your picture is one of the stlink flavors, you can figure out through trial and error or lsusb or other means which one. It might be this for example:
interface hla
hla_layout stlink
hla_device_desc "ST-LINK/V2"
hla_vid_pid 0x0483 0x3748
In my case
openocd -f jlink.cfg -f target.cfg
where target.cfg is a single file that contains the various stm32f1xxx config files from openocd.
The blue pill has an led on PC13, port c pin 13 within that port.
The examples here are not just bare metal but include the bootstrap code, you don't need other code or headers, only these files, and a gnu arm compiler or cross compiler depending on if you are running this on an arm development machine (Raspberry Pi computer, etc) or something else (x86 based, etc). The code is designed for portability and other things, not so much as a library approach, a way to get you started in an "I can do this" fashion then you move on to your own personal preferences by examining more complicated solutions.
sram.s
@.cpu cortex-m0
@.cpu cortex-m3
.thumb
.thumb_func
.global _start
_start:
ldr r0,=0x20001000
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
The stm32f103... is cortex-m3 based which you should know by now because you got the STM32F103C8T6 documentation from st the datasheet with pinouts and such, packaging and part number info (tells you how much flash and ram you have among other things) and the reference manual in st terms (some vendors the datasheet has all the registers and descriptions as well) which has all the registers, address space, and such details. Between these you find out this contains a cortex-m3 from arm, so you go to arms website and get the technical reference manual for the cortex-m3 in which you find it is based on the armv7-m architecture, and at arms website you get the armv7-m architectural reference manual, this is your STARTING set of documentation for this CHIP, then you may try to find a schematic or some other way of figuring out that PC13 is where the led is.
So while the armv7-m supports the much wider array of thumb2 extensions to the thumb instruction set, I generally prefer to start with the traditional thumb instructions for getting started from a generic-ish framework as you will see below then if needed (for performance usually) change the build tools or code to allow in the armv7-m thumb2 extensions, YMMV.
blinker01.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
#define GPIOCBASE 0x40011000
#define RCCBASE 0x40021000
int notmain ( void )
{
unsigned int ra;
unsigned int rx;
ra=GET32(RCCBASE+0x18);
ra|=1<<4; //enable port c
PUT32(RCCBASE+0x18,ra);
//config
ra=GET32(GPIOCBASE+0x04);
ra&=~(3<<20); //PC13
ra|=1<<20; //PC13
ra&=~(3<<22); //PC13
ra|=0<<22; //PC13
PUT32(GPIOCBASE+0x04,ra);
for(rx=0;;rx++)
{
PUT32(GPIOCBASE+0x10,1<<(13+0));
for(ra=0;ra<400000;ra++) dummy(ra);
PUT32(GPIOCBASE+0x10,1<<(13+16));
for(ra=0;ra<400000;ra++) dummy(ra);
}
return(0);
}
Years of experience with chips, boards, simulation of chips and boards and peripherals, plus an old timer when I was in my 20s in the aerospace industry, you want to force the instruction, 32 bit loads and stores, 16, bit etc. So I use asm to get the right instruction I want as well as have a very clean abstraction layer (despite the performance cost, which can be solved with macros as needed down the road with the same source code) with which to attach a simulator, punch through an operating system, etc. YMMV.
I have had the volatile pointer trick fail BTW and produce the wrong instruction, I very rarely use that trick. Never use structs across compile domains as a rule and don't misuse unions, and I say that because these days almost every solution you are going to find is going to violate one of both of those cardinal rules. You will own that risk when you use such libraries. but professionally you will at times have to own that risk as those libraries may be dictated for one of a number of reasons, personal preference, the boss said so, maintaining legacy projects, don't have time/desire to re-write a tcp-ip stack or filesystem, etc.
You can see this is doing a verbose read-modify-write to configure PC13 as a push-pull output. Then using a convenient bit set or reset register to change the state of one pin in that port. Having dummy() in a separate optimization domain, forces the compiler to have to generate that loop and burn that time so that we can see the led on
for(x=0;x<0x80000;x++) continue;
Optimizes away as it does nothing so to avoid that don't optimize intentionally or accidentally, or make x volatile to suggest the compiler does loads and stores every time it touches x, or call a function in another compile domain that uses x forcing the compiler to have to build the loop. LATER you can mess with timers.
Linker scripts are very specific to the toolchain, you can do without with gnu but there is a weirdism that sometimes happens, that I will leave you to find (-Ttext=0x20000000 in this case). You can make it slightly simpler than this but most folks make it significantly more complicated, YMMV.
MEMORY
{
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
}
I have no need to initialize .bss nor .data so I can simply remove all the baggage that comes with those requirements. A violation of the C standard that I am happy with for this example. Also note my entry point is not called main() at least one toolchain so have to assume others looks for that function name and piles on extra baggage that we don't want. The entry point name should be generic. Even _start which is used by the toolchain to mark the entry point as if this were a binary to be run on an operating system. Is not needed the linker may throw a warning but will still build the binary.
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 sram.s -o sram.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -march=armv6-m -mthumb -c blinker01.c -o blinker01.o
arm-none-eabi-ld -o blinker01.elf -T sram.ld sram.o blinker01.o
arm-none-eabi-objdump -D blinker01.elf > blinker01.list
arm-none-eabi-objcopy blinker01.elf blinker01.bin -O binary
I no longer think the -nostdlib -nostartfiles -ffreestanding is required, it depends I guess on which version and flavor of gnu tools you are using.
Always check the disassembly on a first build, and whenever you mess with the makefile/infrastructure:
20000000 <_start>:
20000000: 4805 ldr r0, [pc, #20] ; (20000018 <dummy+0x4>)
20000002: 4685 mov sp, r0
20000004: f000 f80a bl 2000001c <notmain>
20000008: e7ff b.n 2000000a <hang>
2000000a <hang>:
2000000a: e7fe b.n 2000000a <hang>
2000000c <PUT32>:
To insure the entry point we wanted, the code we wanted up front is there and built correctly, which it is in this case.
Open a couple more command line terminals. Generally where you launch openocd is the reference point for where it looks for files, so if you launch in the directory where your binary is, then you don't need to add paths, and/or copy your binary, in this case the elf version is preferred to where you launch your openocd. Or just type paths, its your choice. Also by taking over the openocd config files you can write your own scripts into the config to automate these steps.
In one window launch openocd
openocd -f jlink.cfg -f target.cfg
Open On-Chip Debugger 0.10.0-dev-00325-g12e4a2a (2016-07-05-23:15)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
swd
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
none separate
cortex_m reset_config sysresetreq
Info : No device selected, using first device.
Info : J-Link ARM-OB STM32 compiled Jun 30 2009 11:14:15
Info : Hardware version: 7.00
Info : VTarget = 3.300 V
Info : clock speed 1000 kHz
Info : SWD DPIDR 0x1ba01477
Info : stm32f1x.cpu: hardware has 6 breakpoints, 4 watchpoints
If you do not end up with the line x breakpoints, y watchpoints (for an arm) then you are not ready to move on, get your debugger wired right, etc. Note that the vcc line on the debugger is NOT really there to power the board despite what they say for that particular dongle you have in the picture, I think I blew up the first one of those trying to do that. You want to power the target, then the vcc line is actually a vcc sense line, they drive the IO voltage on the debugger as well as set the sample point for the input. Generically/historically so you can use tools like this on 5.0v 3.3v 3.0v, 1.8v, etc the v sense line is used to see we are using 3.3v levels in this case. Check your wiring check your power, re-install, build from sources, etc until openocd gets you to this point.
The simplest way in is the telnet interface, I have no use for gdb, but that is the next level of complication, save that for after the telnet interface works.
in one of your other command line terminal windows
telnet localhost 4444
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger
>
Then
> halt
stm32f1x.cpu: target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000042 msp: 0x20001000
>
If you halt again after it is halted you don't get that output. I have in flash an infinite loop (take the example below and comment out the bl notmain) to demonstrate to myself this sram example will work for you, nothing on flash other than the stack pointer is required for the sram example to work.
Once halted then
> load_image sram/blinker01.elf
144 bytes written at address 0x20000000
downloaded 144 bytes in 0.008081s (17.402 KiB/s)
> resume 0x20000000
>
And the led should start blinking slowly. One could argue that needs to be resume 0x20000001, but the tool happens to work with 0x20000000.
This is running out of sram, which is generally smaller than the flash, but you can get a lot of education this way without risking bricking the mcu (with the boot0 pin you get a get out jail free card with these stm32 devices, for your own projects for parts that don't have a strap pin for an alternate/safe boot path, you should try to build in an alternate, safe, boot path so you don't brick the chip/board (accidentally change the I/O pin configuration for the jtag pins you rely on to get into the chip and change the firmware for example, been there done that) it happens from time to time no matter how much experience you have. I generally buy at least two if not more of each new to me board, just in case.
Now doing the same thing with flash
flash.s
@.cpu cortex-m0
@.cpu cortex-m3
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
How a processor boots is very specific to that processor, just have to read the docs, in this case it is a vector table not simply an address you just start executing. The details for this ARE in the arm docs, look for words like reset and exception and vector and interrupt and you will find the table. Note these cortex-m mcus can have up to and over 128 or 256 possible vectors, depends on the core and the chip vendor, you don't HAVE to specify more than you are using you can get away with something to consume the stack pointer word (possibly the stack pointer init you want) and have to have the reset vector, beyond that it is up to you, you don't provide a hard fault or undefined instruction path and have it just land in the middle of code, your call, your debug. At the same time setting up vectors for interrupts (burning valuable flash space) you will never enable... your call, your debug.
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
(I recycle my generic examples/starting point so the amount of flash/ram likely does not reflect this actual part, adjust as needed).
Changing the blink rate so I can see the difference between the flash version and sram version.
for(rx=0;;rx++)
{
PUT32(GPIOCBASE+0x10,1<<(13+0));
for(ra=0;ra<200000;ra++) dummy(ra);
PUT32(GPIOCBASE+0x10,1<<(13+16));
for(ra=0;ra<200000;ra++) dummy(ra);
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -march=armv6-m -mthumb -c blinker01.c -o blinker01.o
arm-none-eabi-ld -o blinker01.elf -T flash.ld flash.o blinker01.o
arm-none-eabi-objdump -D blinker01.elf > blinker01.list
arm-none-eabi-objcopy blinker01.elf blinker01.bin -O binary
And very much need to inspect the entry point
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000041 stmdaeq r0, {r0, r6}
8000008: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800000c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000010: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000014: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000018: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800001c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000020: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000024: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000028: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800002c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000030: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000034: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000038: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800003c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
08000040 <reset>:
8000040: f000 f808 bl 8000054 <notmain>
8000044: e7ff b.n 8000046 <hang>
08000046 <hang>:
8000046: e7fe b.n 8000046 <hang>
The initial value for the stack pointer (the disassembly of the vector table is bogus just the disassembler trying to make sense of those words, the values are what we care about ignore the disassembly) then the vectors, the reset vector is the address of the reset handler ORRED with one 0x08000040|1 = 0x08000041, don't expect it to boot otherwise, it will try to go into a fault handler and then likely fail there. You should NOT NEED to orr the one in yourself in this code, if you are doing a bootloader hopping to some other address, sure you will need to orr with one. I strongly discourage ADDING one, if the tool got it right and you add one you now messed up the address and it won't work if the tool got it right and you orr one then you didn't mess it up.
Now here is the rub...
I have had some of these blue pills come with the flash locked, and there is an openocd way to get out of that but I don't know where I recorded it, I did modify my uart based loader to handle it, note another thing to have in your back pocket, this device as with a vast number of other mcus from this and other vendors may come with a burned in bootloader or other non-jtaggy way into the chip in circuit, make some attempts to build a tool for that interface as you may have to some day professionally. In this case the write unprotect command does not actually do what the documentation implies, it doesn't return success and you have to power cycle the chip, but then it is unprotected and you can use a uart bootloader based tool or the swd (jtag like) tool to get into this part.
Same deal as before use openocd to connect to the debugger in the part.
> halt
stm32f1x.cpu: target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x20000014 msp: 0x20000ff0
> flash write_image erase blinker01.elf
auto erase enabled
device id = 0x20036410
flash size = 64kbytes
wrote 1024 bytes from file blinker01.elf in 0.437904s (2.284 KiB/s)
>
If instead you get
stm32x device protected
failed erasing sectors 0 to 0
Your part is write protected and you have to figure that out. If I find it I will post it here, but because the part I have handy is not protected I cant test it. And not really wanting to go buy a handful more parts in the hope they are protected just to demonstrate what you may find when you purchase these blue pills from someone. At the same time I don't see a lot of these examples for this board dealing with this so maybe it was just one batch and/or one manufacturer that did this and myself and another guy online happened to get unlucky.
At this point you can
> reset
Or
push the reset button with the jumpers both moved to the 0 side. Which indicates boot0 and boot1 are strapped to 0 (really only need boot0 to be a 0). In your picture your jumpers are set to 1, you need to move at least the one not next to the reset button, the one next to the reset button is assumed to be boot1.
And the led should blink twice as fast as the sram example. And when you power the board off then on again, with boot0=0 it will start to run this program from flash.
Pure assembly would have been even simpler of an example, the volatile pointer trick slightly simpler. But if you cant get this code working you are probably not going to get something more complicated working. This is close to as simple as it gets.
Note again these examples do not support .data nor assume .bss is zero, it will take a bit more code and knowledge for you to allow for those assumptions, personally I don't rely on those and don't have to complicate the bootstrap nor the linker script (which are toolchain specific and won't port), but that is personal preference.
Search for "stm32 blue pill" and find the pages on STM32duino or something like that, try those examples. There is probably mbed support another sandbox, this one backed buy or supported by ARM, as well as st has at least two flavors of libraries cmsis and a legacy one. Note that chip vendors to be in business often have a library set, and for marketing and other reasons, continue to churn on those, so while the HAL and CMSIS and cube and whatever are present today, expect one and eventually all of those to be gone down the road, even CMSIS, in part because it is a bit of a mess, no real central owner. Granted CMSIS may evolve, but don't expect it to remain in its exact form.
At the same time the way this industry works is you ideally buy a part for your product that is relatively new and/or will be in production about as long as your product hopes to be in production, you write/debug/build the firmware and hopefully never have to touch it, so you can use whatever the popular/FAD library of the day is, ideally save a build machine from today to use in the future, but hope that you never have to touch it and today's favorite library will work long enough to get you to production. the do it yourself approach is far easier to maintain, for simple things like gpio, spi, uart, etc are IMO easier than the library, things like USB, Ethernet, etc might be harder not the interface with the peripherals, but the vendor library will have included a usb stack, an ethernet ip stack, filesystem support which are likely heavily integrated into their peripheral library support and not necessarily worth separating to avoid their peripheral library code.
At the end of the day, professionally, you should assume you own all the code you use, including the libraries. Your boss will only see that the product line failed, and not care that it was a third party library that you chose that was at fault.