• File base with FTP Name with dot in it?

    From Björn Wiberg@2:201/137 to g00r00 on Sunday, March 20, 2022 21:21:42
    Hello g00r00!

    I just noticed that Googlebot has begun crawling the anonymous FTP here. :-D

    To my surprise, they seem to try to enter a /robots.txt subdirectory before trying to download /robots.txt:

    + 2022.03.20 07:03:04 FTP > Connect on slot 1/10 (66.249.64.92)
    + 2022.03.20 07:03:04 FTP 1-HostName crawl-66-249-64-92.googlebot.com
    + 2022.03.20 07:03:04 FTP 1-Country United States of America (US)
    + 2022.03.20 07:03:04 FTP 1-C: USER Data: anonymous
    + 2022.03.20 07:03:04 FTP 1-S: 331 User name okay, need password.
    + 2022.03.20 07:03:05 FTP 1-C: PASS
    + 2022.03.20 07:03:05 FTP 1-S: 230 User logged in, proceed.
    + 2022.03.20 07:03:05 FTP 1-Logged in as Anonymous
    + 2022.03.20 07:03:05 FTP 1-C: CWD Data: robots.txt
    + 2022.03.20 07:03:05 FTP 1-S: 550 Directory change failed
    + 2022.03.20 07:03:05 FTP 1-C: TYPE Data: I
    + 2022.03.20 07:03:05 FTP 1-S: 200 All files sent in BINARY mode.
    + 2022.03.20 07:03:05 FTP 1-C: PASV Data:
    + 2022.03.20 07:03:05 FTP 1-S: 227 Entering Passive Mode (188,149,138,116,198,212).
    + 2022.03.20 07:03:05 FTP 1-C: RETR Data: robots.txt
    + 2022.03.20 07:03:05 FTP 1-S: 550 File not found
    + 2022.03.20 07:03:05 FTP 1-C: QUIT Data:
    + 2022.03.20 07:03:05 FTP 1-S: 221 Goodbye
    + 2022.03.20 07:03:05 FTP 1-Connection closed

    So I thought that *perhaps* -- *if* they manage to change to that directory -- they'll try to fetch a robots.txt file from within it, i.e. /robots.txt/robots.txt. Although I haven't found any evidence of this in any specs from them (or in the Robots Exclusion Protocol draft/standard)...

    So, I added a file base with:

    FTP Name │ robots.txt

    ...and allowed anonymous access to it.

    But any attempt to switch to that directory fails:

    bbs@glimmer:/tmp$ pftp scbbs.nsupdate.info 50021
    Connected to scbbs.nsupdate.info.
    220 Welcome to Star Collision BBS!
    Name (scbbs.nsupdate.info:bbs): anonymous
    331 User name okay, need password.
    Password:
    230 User logged in, proceed.
    Remote system type is UNIX.
    Using binary mode to transfer files.
    prompt
    Interactive mode off.
    ls -l
    227 Entering Passive Mode (188,149,138,116,198,212).
    125 Data connection already open
    drwxr-xr-x 1 ftp ftp 0 Mar 19 17:00 file_list
    drwxr-xr-x 1 ftp ftp 0 Mar 20 20:22 robots.txt
    226 Closing data connection.
    cd robots.txt
    550 Directory change failed
    ls robots.txt
    227 Entering Passive Mode (188,149,138,116,198,212).
    125 Data connection already open
    drwxr-xr-x 1 ftp ftp 0 Mar 20 20:22 robots.txt
    226 Closing data connection.
    ls robots.txt/
    227 Entering Passive Mode (188,149,138,116,198,212).
    125 Data connection already open
    226 Closing data connection.
    ls robots.txt/robots.txt
    227 Entering Passive Mode (188,149,138,116,198,212).
    125 Data connection already open
    -rw-r--r-- 1 ftp ftp 26 Mar 20 20:24 robots.txt
    226 Closing data connection.
    mget robots.txt/robots.txt
    local: robots.txt remote: robots.txt
    227 Entering Passive Mode (188,149,138,116,198,212).
    550 File not found
    quit
    221 Goodbye

    And the MIS log shows pretty much the same:

    + 2022.03.20 21:11:14 FTP > Connect on slot 1/10 (192.168.1.1)
    + 2022.03.20 21:11:14 FTP 1-HostName router.asus.com
    + 2022.03.20 21:11:14 FTP 1-Country Unknown (-)
    + 2022.03.20 21:11:16 FTP 1-C: USER Data: anonymous
    + 2022.03.20 21:11:16 FTP 1-S: 331 User name okay, need password.
    + 2022.03.20 21:11:17 FTP 1-C: PASS
    + 2022.03.20 21:11:17 FTP 1-S: 230 User logged in, proceed.
    + 2022.03.20 21:11:17 FTP 1-Logged in as Anonymous
    + 2022.03.20 21:11:17 FTP 1-C: SYST Data:
    + 2022.03.20 21:11:17 FTP 1-S: 215 UNIX Type: L8
    + 2022.03.20 21:11:22 FTP 1-C: PASV Data:
    + 2022.03.20 21:11:22 FTP 1-S: 227 Entering Passive Mode (188,149,138,116,198,212).
    + 2022.03.20 21:11:22 FTP 1-C: LIST Data: -l
    + 2022.03.20 21:11:22 FTP 1-Listing files in /
    + 2022.03.20 21:11:22 FTP 1-S: 125 Data connection already open
    + 2022.03.20 21:11:22 FTP 1-S: 226 Closing data connection.
    + 2022.03.20 21:11:30 FTP 1-C: CWD Data: robots.txt
    + 2022.03.20 21:11:30 FTP 1-S: 550 Directory change failed
    + 2022.03.20 21:11:35 FTP 1-C: PASV Data:
    + 2022.03.20 21:11:35 FTP 1-S: 227 Entering Passive Mode (188,149,138,116,198,212).
    + 2022.03.20 21:11:35 FTP 1-C: LIST Data: robots.txt
    + 2022.03.20 21:11:35 FTP 1-Listing files in /
    + 2022.03.20 21:11:35 FTP 1-S: 125 Data connection already open
    + 2022.03.20 21:11:35 FTP 1-S: 226 Closing data connection.
    + 2022.03.20 21:11:40 FTP 1-C: PASV Data:
    + 2022.03.20 21:11:40 FTP 1-S: 227 Entering Passive Mode (188,149,138,116,198,212).
    + 2022.03.20 21:11:40 FTP 1-C: LIST Data: robots.txt/
    + 2022.03.20 21:11:40 FTP 1-Listing files in robots.txt
    + 2022.03.20 21:11:40 FTP 1-S: 125 Data connection already open
    + 2022.03.20 21:11:40 FTP 1-S: 226 Closing data connection.
    + 2022.03.20 21:11:46 FTP 1-C: PASV Data:
    + 2022.03.20 21:11:46 FTP 1-S: 227 Entering Passive Mode (188,149,138,116,198,212).
    + 2022.03.20 21:11:46 FTP 1-C: LIST Data: robots.txt/robots.txt
    + 2022.03.20 21:11:46 FTP 1-Listing files in robots.txt
    + 2022.03.20 21:11:46 FTP 1-S: 125 Data connection already open
    + 2022.03.20 21:11:46 FTP 1-S: 226 Closing data connection.
    + 2022.03.20 21:11:53 FTP 1-C: PASV Data:
    + 2022.03.20 21:11:53 FTP 1-S: 227 Entering Passive Mode (188,149,138,116,198,212).
    + 2022.03.20 21:11:53 FTP 1-C: NLST Data: robots.txt/robots.txt
    + 2022.03.20 21:11:53 FTP 1-Listing files in robots.txt
    + 2022.03.20 21:11:53 FTP 1-S: 125 Data connection already open
    + 2022.03.20 21:11:53 FTP 1-S: 226 Closing data connection.
    + 2022.03.20 21:11:53 FTP 1-C: TYPE Data: I
    + 2022.03.20 21:11:53 FTP 1-S: 200 All files sent in BINARY mode.
    + 2022.03.20 21:11:53 FTP 1-C: PASV Data:
    + 2022.03.20 21:11:53 FTP 1-S: 227 Entering Passive Mode (188,149,138,116,198,212).
    + 2022.03.20 21:11:53 FTP 1-C: RETR Data: robots.txt
    + 2022.03.20 21:11:53 FTP 1-S: 550 File not found
    + 2022.03.20 21:11:56 FTP 1-C: QUIT Data:
    + 2022.03.20 21:11:56 FTP 1-S: 221 Goodbye
    + 2022.03.20 21:11:56 FTP 1-Connection closed

    So, switching to that directory appears impossible, and listing of file details in it only appears to work if you supply the exact file name.

    However, if I change the FTP Name to something without a dot in it, things work as expected (but then, that is not the name I want/need for the directory).

    Do you know what could be causing this?

    And, is there perhaps a better way of getting a /robots.txt in place (i.e. in the root directory) on the FTP server?

    Many thanks in advance! =)

    Best regards
    Björn

    --- Mystic BBS v1.12 A48 2022/03/11 (Linux/64)
    * Origin: Star Collision BBS, Uppsala, Sweden (2:201/137)
  • From mark lewis@1:3634/12.73 to Björn Wiberg on Tuesday, April 05, 2022 08:44:38

    On 2022 Mar 20 21:21:42, you wrote to g00r00:

    I just noticed that Googlebot has begun crawling the anonymous FTP here. :-D

    To my surprise, they seem to try to enter a /robots.txt subdirectory before
    trying to download /robots.txt:

    that's how they decide if it is a file or a directory... if it is a file, then they download it...

    )\/(ark

    "The soul of a small kitten in the body of a mighty dragon. Look on my majesty, ye mighty, and despair! Or bring me catnip. Your choice. Oooh, a shiny thing!"
    ... Cats are magical, the more you pet them the longer you both live
    ---
    * Origin: (1:3634/12.73)
  • From Björn Wiberg@2:201/137 to mark lewis on Tuesday, April 05, 2022 16:24:55
    Hello Mark!

    Thank you for your reply!

    On 05 Apr 2022, mark lewis said the following...
    To my surprise, they seem to try to enter a /robots.txt subdirectory

    that's how they decide if it is a file or a directory... if it is a
    file, then they download it...

    Yes, I suspected that might be the case... I wonder, though, if they do anything special if the cd succeeds or if they just conclude "not a file" and skip it. (According to the spec, they shouldn't look for anything inside that directory, if it exists.)

    Best regards
    Björn

    --- Mystic BBS v1.12 A48 2022/03/26 (Linux/64)
    * Origin: Star Collision BBS, Uppsala, Sweden (2:201/137)
  • From mark lewis@1:3634/12.73 to Björn Wiberg on Wednesday, April 06, 2022 13:45:10

    On 2022 Apr 05 16:24:54, you wrote to me:

    To my surprise, they seem to try to enter a /robots.txt subdirectory

    that's how they decide if it is a file or a directory... if it is a
    file, then they download it...

    Yes, I suspected that might be the case... I wonder, though, if they
    do anything special if the cd succeeds

    if the cd succeeds they do an ls to see if there's file and directories in it... if there are, they download them and continue walking through the directories...

    or if they just conclude "not a file" and skip it. (According to the
    spec, they shouldn't look for anything inside that directory, if it exists.)

    what spec is that?

    )\/(ark

    "The soul of a small kitten in the body of a mighty dragon. Look on my majesty, ye mighty, and despair! Or bring me catnip. Your choice. Oooh, a shiny thing!"
    ... If all you have is a hammer, everything looks like a nail.
    ---
    * Origin: (1:3634/12.73)
  • From Björn Wiberg@2:201/137 to mark lewis on Thursday, April 07, 2022 09:50:03
    Hello Mark!

    Thank you for your reply!

    On 06 Apr 2022, mark lewis said the following...
    if the cd succeeds they do an ls to see if there's file and directories
    in it... if there are, they download them and continue walking through
    the directories...

    OK! Thanks!

    or if they just conclude "not a file" and skip it. (According to the spec, they shouldn't look for anything inside that directory, if it exists.)

    what spec is that?

    I was thinking of this one:

    https://developers.google.com/search/docs/advanced/robots/robots_txt#examples-o f-valid-robots.txt-urls

    http://example.com/folder/robots.txt
    "Not a valid robots.txt file. Crawlers don't check for robots.txt files in subdirectories. "

    Best regards
    Björn

    --- Mystic BBS v1.12 A48 2022/03/26 (Linux/64)
    * Origin: Star Collision BBS, Uppsala, Sweden (2:201/137)
  • From mark lewis@1:3634/12.73 to Björn Wiberg on Friday, April 08, 2022 06:46:44

    On 2022 Apr 07 09:50:02, you wrote to me:

    http://example.com/folder/robots.txt
    "Not a valid robots.txt file. Crawlers don't check for robots.txt files in subdirectories. "

    this is true... robots.txt applies to the entire site so it is only valid when found in the root directory of the site... if it is located anywhere else, it is just another file...

    )\/(ark

    "The soul of a small kitten in the body of a mighty dragon. Look on my majesty, ye mighty, and despair! Or bring me catnip. Your choice. Oooh, a shiny thing!"
    ... People build walls to see who cares enough to break them down.
    ---
    * Origin: (1:3634/12.73)