Parse CSV file and match pattern in PHP

Question

I have a CSV file as follows

***Client Name: abc***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),abc
6/6/2013,1
6/11/2013,3
6/12/2013,2
6/13/2013,1
6/14/2013,2
6/15/2013,4
6/17/2013,4
6/18/2013,8
6/19/2013,7
# *** Interval: Daily ***,
,
***Client Name: abc***,
,
# ----------------------------------------,
# Facebook Insights : Likes by Source,
# ----------------------------------------,
Sources,Likes
Mobile,3602
Page Profile,470
Recommended Pages,86
Ads,64
Like Story,49
Mobile Sponsored Page You May Like,44
Page Browser,33
Search,22
Timeline,16
Mobile Page Suggestions On Liking,15
3 more sources,48
,
***Client Name: xyz***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),xyz
6/12/2013,1
# *** Interval: Daily ***,
,
***Client Name: pqr***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),pqr
6/6/2013,2
6/7/2013,3
6/9/2013,6
6/10/2013,1
6/12/2013,4
6/13/2013,1
6/14/2013,9
6/15/2013,5
6/16/2013,1
6/18/2013,2
6/19/2013,2
# *** Interval: Daily ***,

out of which I want to extract Twitter : Mentions - Count data and save everything in database.

I want content between

# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,

and

 # *** Interval: Daily ***,

How can I match this pattern in PHP is there any php class which match pattern from file or how can I do this with regax.

I don't have any idea about pattern matching I have just tried to read csv file using fgetcsv() as

 $file = fopen($uploaded_file_path, 'r');
            echo "<pre>";
            while (($line = fgetcsv($file)) !== FALSE) {
              print_r($line);
            }
            fclose($file);

Now I am checking this answer http://stackoverflow.com/a/9826656/1868660 — Subodh Ghulaxe, Jul 03 '13 at 07:46

Ro Yo Mi · Accepted Answer · 2013-07-04T06:02:32.153

Description

This regex will find each section header Twitter Mentions - Count and capture the section body into group 1.

^\#\sTwitter\s:\sMentions\s-\sCount,[\s\r\n]+    # match the header
^\#\s----------------------------------------,[\s\r\n]+   # match the separator line
(^(?:(?!\#\s\*\*\*\sInterval:\sDaily\s\*\*\*,).)*)    # match the rest of the string upto the first Interval Daily

enter image description here

Expanded

This first section simple finds the start of each block, it's a lot of characters but is largely straight forward.
- ^ match the start of a line, requires the of the multiline option which is usually m
- \#\sTwitter\s:\sMentions\s-\sCount, match this exact string, note the \s will match a space character, I do this because I like to use the ignore white space option which is usually x
- [\s\r\n]+ match one or more space or new line character.
- ^\#\s----------------------------------------,[\s\r\n]+ This matches the characters in the separator line from the start of the line ^ to the new line character at the end
This section captures the body of the section, and is where the real magic happens.
- ( Start the capture group 1
- ^ ensure we match the start of the line, This ensures the next lookahead validates properly
- (?: start non capture group. The construction of this non-capture group is self terminating when it encounters the undesirable string inside the negative lookahead. This will end up capturing every character between the section title above and the finish string.
- (?! start negative lookahead, this will validate we do not travel into the undesirable close text which marks the finish of the section.
- \#\s\*\*\*\sInterval:\sDaily\s\*\*\*, match the undesirable text. If this is found, then the negative lookahead will fail
- ) close the negative look ahead
- . match any character, this is expecting the "dot matches new line" option usually s.
- ) close the non capture group
- * allow the non capture group to repeat zero or more times.
- ) close capture group 1. Since all that happened inside this capture group every matched . will be stored here.

PHP Example

Live Example: http://www.rubular.com/r/stgaiBeSE1

Sample Text

***Client Name: abc***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),abc
6/6/2013,1
6/11/2013,3
6/12/2013,2
6/13/2013,1
6/14/2013,2
6/15/2013,4
6/17/2013,4
6/18/2013,8
6/19/2013,7
# *** Interval: Daily ***,
,
***Client Name: abc***,
,
# ----------------------------------------,
# Facebook Insights : Likes by Source,
# ----------------------------------------,
Sources,Likes
Mobile,3602
Page Profile,470
Recommended Pages,86
Ads,64
Like Story,49
Mobile Sponsored Page You May Like,44
Page Browser,33
Search,22
Timeline,16
Mobile Page Suggestions On Liking,15
3 more sources,48
,
***Client Name: xyz***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),xyz
6/12/2013,1
# *** Interval: Daily ***,
,
***Client Name: pqr***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),pqr
6/6/2013,2
6/7/2013,3
6/9/2013,6
6/10/2013,1
6/12/2013,4
6/13/2013,1
6/14/2013,9
6/15/2013,5
6/16/2013,1
6/18/2013,2
6/19/2013,2
# *** Interval: Daily ***,

Code

<?php
$sourcestring="your source string";
preg_match_all('/^\#\sTwitter\s:\sMentions\s-\sCount,[\s\r\n]+
^\#\s----------------------------------------,[\s\r\n]+
(^(?:(?!\#\s\*\*\*\sInterval:\sDaily\s\*\*\*,).)*)/imsx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Matches from Capture Group 1

[0] => Date/Time (GMT),abc
    6/6/2013,1
    6/11/2013,3
    6/12/2013,2
    6/13/2013,1
    6/14/2013,2
    6/15/2013,4
    6/17/2013,4
    6/18/2013,8
    6/19/2013,7

[1] => Date/Time (GMT),xyz
    6/12/2013,1

[2] => Date/Time (GMT),pqr
    6/6/2013,2
    6/7/2013,3
    6/9/2013,6
    6/10/2013,1
    6/12/2013,4
    6/13/2013,1
    6/14/2013,9
    6/15/2013,5
    6/16/2013,1
    6/18/2013,2
    6/19/2013,2

            )

you are great great great, can you please explain which steps you used to find this regex, how did you created that image. — Subodh Ghulaxe, Jul 04 '13 at 05:32
:) Well technically I wrote it in windows notepad before running it through that live example. For the image I'm using debuggex.com. Although it doesn't support lookbehinds, named capture groups, or atomic groups it's still handy for understanding the expression flow. There is also regexper.com. They do a pretty good job too, but it's not real time as you're typing. — Ro Yo Mi, Jul 04 '13 at 05:40
I updated the answer with detailed expanded explanation of the expression, enjoy. — Ro Yo Mi, Jul 04 '13 at 06:03
@RoYoMi Thanks for the the answer can you tell me what is $sourcestring="your source string"; <-- — B Karthik Kumar, Oct 12 '17 at 08:58
@BKarthikKumar I opted to separate out the source text into a section in the original answer. The source text itself looks like programming code and I felt that embedding it into the example code could be confusing. — Ro Yo Mi, Oct 13 '17 at 13:26

score 0 · Answer 2 · answered Jul 03 '13 at 07:58

try this

public static function csv_to_array($filename='', $delimiter=',')
 { 
    if(!file_exists($filename) || !is_readable($filename))
        return FALSE;

    $header = NULL;
    $data = array();
    if (($handle = fopen($filename, 'r')) !== FALSE)
    {
        while (($row = fgetcsv($handle, 1000, $delimiter)) !== FALSE)
        {
                $data[] = $row;
        }
        fclose($handle);
    }
    return $data;
 }

Parse CSV file and match pattern in PHP

2 Answers2

Description

Expanded

PHP Example