1

I have a huge text-file that is separated by semicolons (;). the numbers have the format with thousand-separator point and decimal-separator  coma e.g. 123.456.891.234,56

I don't want to search and replace the points in the file with an editor, because I don't have permission to change the file.

I could read it as string and try to get rid of the points. But it doesn't seem like a good way of solving the problem.

program prjRead
  implicit none

  integer:: a
  real(8) :: b
  real(8) :: c
  character(10) :: dummy

  open (123,file = "test.csv", DECIMAL='COMMA')

  read(123,*) dummy
  read(123,*) a, &
              b, &
              c

  write(*,*) a,b,c

end program prjRead

content of test.csv

integer;decimalcomma;thousandsep
5;56,67;123.456,78

At line 36 of file prjRead.f90 (unit = 123, file = 'test.csv') Fortran runtime error: Bad real number in item 3 of list input

talonmies
  • 70,661
  • 34
  • 192
  • 269
Ratilius
  • 133
  • 1
  • 9
  • @francescalus Not at the opening statement, but when I debug reading the first line of the file. – Ratilius Feb 25 '19 at 16:33
  • In that case you should show also the read statements that give the error. – Vladimir F Героям слава Feb 25 '19 at 16:39
  • Related https://stackoverflow.com/questions/25546788/fortran-decimal-and-thousand-separator https://stackoverflow.com/questions/21117216/fortran-formated-output-for-floating-point-numbers – Vladimir F Героям слава Feb 25 '19 at 16:41
  • 2
    Personally I'd use `sed` to transform the file prior to reading it. – High Performance Mark Feb 25 '19 at 16:42
  • @VladimirF the link is about a thousand separator in the output – Ratilius Feb 25 '19 at 16:51
  • @Ratilius 1. those are two links, not one. 2. output and input are closely related. 3. Linking a related question is not the same as voting to close as a duplicate. If I thought it is a duplicate, I would have just closed your question. – Vladimir F Героям слава Feb 25 '19 at 16:52
  • 1
    *because I don't have permission to change the file.*. Well you wouldn't do that even if you did have such permissions, would you ? You'd use `sed` to convert the input file into an output file with the right characteristics. – High Performance Mark Feb 25 '19 at 16:58
  • I may not change the file. So no sed. – Ratilius Feb 25 '19 at 16:58
  • 1
    So, you cannot copy the file to 'tmp.dat'? `sed s/,/#/g a.dat | sed s/"\."/,/g | sed s/#/"\."/g > tmp.dat` This will convert your `5;56,67;123.456,78` in to `5;56.67;123,456.78` – Steve Feb 25 '19 at 18:29
  • Heck, an even shorter sed command `sed s/"\."//g a.dat > tmp.dat` This then allows you to use `DECIMAL="COMMA"` – Steve Feb 25 '19 at 18:36
  • I will flank the obstacle and take that solution. More precisely I don't have permission to write in that folder in the production environment. But this is a political and not a technical problem. I can't mark the question as answered, because i cant find the button on my phone. – Ratilius Feb 25 '19 at 21:06

3 Answers3

1

If you really want do this in Fortran it's not too difficult. First, let's have a function to drop all the occurrences of . from a string:

  FUNCTION drop_stops(instr) RESULT(outstr)
    CHARACTER(len=*), INTENT(in) :: instr
    CHARACTER(len=:), ALLOCATABLE :: outstr
    CHARACTER(len=1), DIMENSION(:), ALLOCATABLE :: str_array

    ALLOCATE(str_array(LEN_TRIM(instr)))
    str_array = TRANSFER(instr,str_array)
    str_array = PACK(str_array,str_array /= '.')
    ALLOCATE(CHARACTER(len=SIZE(str_array))::outstr)
    outstr = TRANSFER(str_array,outstr)
  END FUNCTION drop_stops

I trust that this is obvious enough to need no explanation beyond the documentation of any functions or statements you're not familiar with.

Then, sticking with your original code and declaring another string variable you could read dummy, as you already do, then write something like

 dummy_without_stops = drop_stops(dummy)

and now you can do an internal read on that to get the numbers you're interested in, something like

read(dummy_without_stops,*,decimal='comma') a, b, c

Note that the approach implemented in drop_stops depends on the characters in a string being laid out in memory sequentially and matching the same storage for the characters in the array. I'm confident that this will work for ASCII characters, not so sure about ISO_10646 characters.

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • it works like a charme. a colleague showed me how I can read a whole line. I'll add the code next week. – Ratilius Feb 28 '19 at 17:15
0

You did not provide a big enough sample from your input file. But essentially, you need to first split the file contents by the delimiter ;. Then for each string number that you obtain, replace all thousands separators . with "" (nothing). Then, replace the decimal symbol , with the normal decimal notation .. Here is a try to achieve this via splitStr() and replaceStr() type-bound procedures below, tested on the sample file contents line that you have provided 5;56,67;123.456,78

module String_mod

    use, intrinsic :: iso_fortran_env, only: IK=>int32, RK=>real64
    implicit none

    public

    character(*), parameter :: MODULE_NAME = "@String_mod"

    type :: CharVec_type
        character (:), allocatable  :: record
    end type CharVec_type

    type :: String_type
        character(:)      , allocatable   :: value
        type(CharVec_type), allocatable   :: Parts(:)
        integer(IK)                       :: nPart = 0
    contains
        procedure, nopass :: replaceStr, splitStr, str2num
    end type String_type

!***********************************************************************************************************************************
!***********************************************************************************************************************************

contains

!***********************************************************************************************************************************
!***********************************************************************************************************************************

    recursive function replaceStr(string,search,substitute) result(modifiedString)
        implicit none
        character(len=*), intent(in)  :: string, search, substitute
        character(len=:), allocatable :: modifiedString
        integer(IK)                   :: i, stringLen, searchLen
        stringLen = len(string)
        searchLen = len(search)
        if (stringLen==0 .or. searchLen==0) then
            modifiedString = ""
            return
        elseif (stringLen<searchLen) then
            modifiedString = string
            return
        end if
        i = 1
        do
            if (string(i:i+searchLen-1)==search) then
                modifiedString = string(1:i-1) // substitute // replaceStr(string(i+searchLen:stringLen),search,substitute)
                exit
            end if
            if (i+searchLen>stringLen) then
                modifiedString = string
                exit
            end if
            i = i + 1
            cycle
        end do
    end function replaceStr

!***********************************************************************************************************************************
!***********************************************************************************************************************************

    function splitStr(string,delimiter)

        implicit none
        character(len=*)  , intent(in)  :: string,delimiter
        character(len=:)  , allocatable :: dummyStr
        type(CharVec_type), allocatable :: splitStr(:)
        integer(IK)                     :: maxNumSplit
        integer(IK)                     :: stringLen, delimLen, splitCounter, currentPos

        dummyStr  = string
        delimLen  = len(delimiter)
        stringLen = len(dummyStr)

        if (delimLen==0) then
            allocate(splitStr(1))
            splitStr(1)%record = string
            return
        end if

        maxNumSplit = 1 + stringLen / delimLen
        allocate(splitStr(maxNumSplit))
        splitCounter = 1
        loopParseString: do
            if (stringLen<delimLen) then
                splitStr(splitCounter)%record = dummyStr
                exit loopParseString
            elseif (stringLen==delimLen) then
                if (dummyStr==delimiter) then
                    splitStr(splitCounter)%record = ""
                end if
                exit loopParseString
            elseif (dummyStr(1:delimLen)==delimiter) then
                dummyStr = dummyStr(delimLen+1:stringLen)
                stringLen = len(dummyStr)
                cycle loopParseString
            else
                currentPos = 2
                loopSearchString: do
                    if (dummyStr(currentPos:currentPos+delimLen-1)==delimiter) then
                        splitStr(splitCounter)%record = dummyStr(1:currentPos-1)
                        if (currentPos+delimLen>stringLen) then
                            exit loopParseString
                        else
                            splitCounter = splitCounter + 1
                            dummyStr = dummyStr(currentPos+delimLen:stringLen)
                            stringLen = len(dummyStr)
                            cycle loopParseString
                        end if
                    else
                        currentPos = currentPos + 1
                        if (stringLen<currentPos+delimLen-1) then
                            splitStr(splitCounter)%record = dummyStr
                            exit loopParseString
                        end if
                        cycle loopSearchString
                    end if
                end do loopSearchString
            end if
        end do loopParseString
        splitStr = splitStr(1:splitCounter)

    end function splitStr

!***********************************************************************************************************************************
!***********************************************************************************************************************************

    pure elemental function str2num(str)
        implicit none
        character(len=*), intent(in) :: str
        real(RK)                  :: str2num
        read(str,*) str2num
    end function str2num

!***********************************************************************************************************************************
!***********************************************************************************************************************************

end module String_mod

program readFile_prog
    use String_mod, only: String_type
    implicit none
    ! Rules: comma means decimal point. Dot means thousands separator. delimiter is ;.
    character(*), parameter :: FileToRead = "5;56,67;123.456,78"
    type(String_type)       :: String
    integer                 :: i

    ! read file
    String%value = FileToRead

    ! split file contents into individual numbers 
    String%Parts = String%splitStr(String%value,";")

    ! count the number of integers in the file
    String%nPart = size(String%Parts)

    do i = 1, String%nPart

        ! For each number, remove the thousands separator, by replacing "." with ""
        String%Parts(i)%record = String%replaceStr(String%Parts(i)%record,".","")

        ! now replace comma decimal symbols with regular . decimal notation
        String%Parts(i)%record = String%replaceStr(String%Parts(i)%record,",",".")

        ! Covert the integer number from character type to integer and print it on screen
        write(*,"(*(g0,:,' '))") "Number(",i,") = ", String%str2num(String%Parts(i)%record)

    end do

end program readFile_prog

Here is output:

$gfortran -std=f2008 *.f95 -o main
$main
Number( 1 ) =  5.0000000000000000
Number( 2 ) =  56.670000000000002
Number( 3 ) =  123456.78000000000
Scientist
  • 1,767
  • 2
  • 12
  • 20
0
program prjRead
  implicit none

  integer:: a
  real(8) :: b
  real(8) :: c
  character(18) :: header
  character(18) :: cAsString
  character(18) :: cWithOutStops

  open (123,file = "test.csv", DECIMAL='COMMA')

  read(123,*) header
  read(123,*) a, &
              b, &
              cAsString

  cWithOutStops = drop_stops(cAsString)

  read(cWithOutStops,*,decimal='comma')  c

  write(*,*) "gives c without decimal places"
  write(*,*) a,b,c

  write(*,*) "********************"
  write(*,*) "second try, the function drop_stops does what it 
promisses. If have to feed it with the right string"
  cAsString ="123.456,78"
  cWithOutStops = drop_stops(cAsString)
  read(cWithOutStops,*,decimal='comma')  c

  write(*,*) a,b,c

contains

FUNCTION drop_stops(instr) RESULT(outstr)
    CHARACTER(len=*), INTENT(in) :: instr
    CHARACTER(len=:), ALLOCATABLE :: outstr
    CHARACTER(len=1), DIMENSION(:), ALLOCATABLE :: str_array

    ALLOCATE(str_array(LEN_TRIM(instr)))
    str_array = TRANSFER(instr,str_array)
    str_array = PACK(str_array,str_array /= '.')
    ALLOCATE(CHARACTER(len=SIZE(str_array))::outstr)
    outstr = TRANSFER(str_array,outstr)
  END FUNCTION drop_stops
end program prjRead

Output

 gives c without decimal places            5   56.670000000000002        123456.00000000000  ********************  second try, the function drop_stops does what it promisses. If have to  feed it with the right string            5   56.670000000000002        123456.78000000000

HighPerformanceMarks Solution is almost there. I only lack the skill to swallow number c in one piece at reading. or I read 2 strings and concatenate them.

Ratilius
  • 133
  • 1
  • 9